Re: [OMPI devel] RTLD_GLOBAL question

2014-12-04 Thread Artem Polyakov
2014-12-04 17:29 GMT+06:00 Jeff Squyres (jsquyres) : > On Dec 3, 2014, at 11:35 PM, Artem Polyakov wrote: > > > Jeff, I must admit that I don't completely understand how your fix work. > Can you explan me why this veriant was failing: > > > >

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-04 Thread Jeff Squyres (jsquyres)
On Dec 3, 2014, at 11:35 PM, Artem Polyakov wrote: > Jeff, I must admit that I don't completely understand how your fix work. Can > you explan me why this veriant was failing: > > CPPFLAGS="-I$srcdir/opal/libltdl/" > AC_EGREP_HEADER([lt_dladvise_init],

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
Jeff, I must admit that I don't completely understand how your fix work. Can you explan me why this veriant was failing: CPPFLAGS="-I$srcdir/opal/libltdl/" AC_EGREP_HEADER([lt_dladvise_init], [$srcdir/opal/libltdl/ltdl.h] while the new one: CPPFLAGS="-I$srcdir -I$srcdir/opal/libltdl/"

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Howard Pritchard
Hello Artem, No, but I was also told by schedmd that the slurm we have on our systems is ancient. So I'm no longer considering this problem very important. We have a workaround of always configuring with --disable-dlopen. Thanks, Howard 2014-12-02 20:59 GMT-07:00 Artem Polyakov

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Jeff Squyres (jsquyres)
Thanks! On Dec 3, 2014, at 7:03 AM, Artem Polyakov wrote: > > > среда, 3 декабря 2014 г. пользователь Jeff Squyres (jsquyres) написал: > They were equivalent until yesterday. :-) > I see. Got that! > > I was going to file a PR to bring the changes over to v1.8, but not

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
среда, 3 декабря 2014 г. пользователь Jeff Squyres (jsquyres) написал: > They were equivalent until yesterday. :-) I see. Got that! > > I was going to file a PR to bring the changes over to v1.8, but not until > they had shaken out on master. > > Would you mind filing a PR? Sure, will do that

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Jeff Squyres (jsquyres)
They were equivalent until yesterday. :-) I was going to file a PR to bring the changes over to v1.8, but not until they had shaken out on master. Would you mind filing a PR? On Dec 3, 2014, at 5:56 AM, Artem Polyakov wrote: > I finally found the clear reason of this

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
I finally found the clear reason of this strange situation! In ompi opal_setup_libltdl.m4 has the following content: CPPFLAGS="-I$srcdir -I$srcdir/opal/libltdl" AC_EGREP_HEADER([lt_dladvise_init], [opal/libltdl/ltdl.h], [OPAL_HAVE_LTDL_ADVISE=1]) And in ompi-release

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Ralph Castain
It is working for me, but I’m not sure if that is because of these changes or if it always worked for me. I haven’t tested the slurm integration in awhile. > On Dec 2, 2014, at 7:59 PM, Artem Polyakov wrote: > > Howard, does current mater fix your problems? > > среда, 3

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
Howard, does current mater fix your problems? среда, 3 декабря 2014 г. пользователь Artem Polyakov написал: > > 2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) >: > >> On Dec 2, 2014, at 8:43 PM, Artem Polyakov

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) : > On Dec 2, 2014, at 8:43 PM, Artem Polyakov wrote: > > > Jeff, your fix brakes my system again. Actually you just reverted my > changes. > > No, I didn't just revert them -- I made changes. I did forget

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Jeff Squyres (jsquyres)
On Dec 2, 2014, at 8:43 PM, Artem Polyakov wrote: > Jeff, your fix brakes my system again. Actually you just reverted my changes. No, I didn't just revert them -- I made changes. I did forget about the second -I, though (to be fair, the 2nd -I was the *only* -I in there

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
Hello, Jeff, your fix brakes my system again. Actually you just reverted my changes. Here is what I have: configure:5441: *** GNU libltdl setup configure:296939: checking location of libltdl configure:296952: result: internal copy configure:297028: OPAL configuring in opal/libltdl

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Jeff Squyres (jsquyres)
I'm able to replicate Edgar's problem. I'm investigating... On Dec 2, 2014, at 10:39 AM, Edgar Gabriel wrote: > the mailing list refused to let me add the config.log file, since it is too > large, I can forward the output to you directly as well (as I did to Jeff). > > I

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Edgar Gabriel
the mailing list refused to let me add the config.log file, since it is too large, I can forward the output to you directly as well (as I did to Jeff). I honestly have not looked into the configure logic, I can just tell that OPAL_HAVE_LTDL_ADVISE is not set on my linux system for master, but

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
2014-12-02 20:59 GMT+06:00 Edgar Gabriel : > didn't want to interfere with this thread, although I have a similar > issue, since I have the solution nearly fully cooked up. But anyway, this > last email gave the hint on why we have suddenly the problem in ompio: > > it looks

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Edgar Gabriel
I checked with the debugger, that it did skip the entire section On 12/2/2014 9:04 AM, Jeff Squyres (jsquyres) wrote: Oy -- I thought we fixed that. :-( Are you saying that configure output says that ltdladvise is not found? On Dec 2, 2014, at 9:59 AM, Edgar Gabriel

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Jeff Squyres (jsquyres)
Oy -- I thought we fixed that. :-( Are you saying that configure output says that ltdladvise is not found? On Dec 2, 2014, at 9:59 AM, Edgar Gabriel wrote: > didn't want to interfere with this thread, although I have a similar issue, > since I have the solution nearly

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Edgar Gabriel
didn't want to interfere with this thread, although I have a similar issue, since I have the solution nearly fully cooked up. But anyway, this last email gave the hint on why we have suddenly the problem in ompio: it looks like OPAL_HAVE_LTDL_ADVISE (at least on my systems) is not set

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Jeff Squyres (jsquyres)
Looks like I was totally lying in http://www.open-mpi.org/community/lists/devel/2014/12/16381.php (where I said we should not use RTLD_GLOBAL). We *do* use RTLD_GLOBAL: https://github.com/open-mpi/ompi/blob/master/opal/mca/base/mca_base_component_repository.c#L124 This ltdl advice object is

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
Agree. First you should check is to what value OPAL_HAVE_LTDL_ADVISE is set. If it is zero - very probably this is the same bug as mine. 2014-12-02 17:33 GMT+06:00 Ralph Castain : > It does look similar - question is: why didn’t this fix the problem? Will > have to

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
2014-12-02 17:13 GMT+06:00 Ralph Castain : > Hmmm…if that is true, then it didn’t fix this problem as it is being > reported in the master. > I had this problem on my laptop installation. You can check my report it was detailed enough and see if you hitting the same issue. My

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
I think this might be related to the configuration problem I was fixing with Jeff few months ago. Refer here: https://github.com/open-mpi/ompi/pull/240 2014-12-02 10:15 GMT+06:00 Ralph Castain : > If it isn’t too much trouble, it would be good to confirm that it remains >

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Ralph Castain
If it isn’t too much trouble, it would be good to confirm that it remains broken. I strongly suspect it is based on Moe’s comments. Obviously, other people are making this work. For Intel MPI, all you do is point it at libpmi and they can run. However, they do explicitly dlopen it in their

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
$ srun --version slurm 2.6.6-VENDOR_PROVIDED $ srun --mpi=pmi2 -n 1 ~/hw I am 0 / 1 $ srun -n 1 ~/hw /csc/home1/gouaillardet/hw: symbol lookup error: /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_verbose srun: error: slurm_receive_msg: Zero Bytes were transmitted or received srun:

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Ralph Castain
Out of curiosity - how are you testing these? I have more current versions of Slurm and would like to test the observations there. > On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet > wrote: > > I d like to make a step back ... > > i previously tested with slurm

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
I d like to make a step back ... i previously tested with slurm 2.6.0, and it complained about the slurm_verbose symbol that is defined in libslurm.so so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok now i tested with slurm 2.6.6 and it complains about the slurm_auth_get_arg_desc symbol, and

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
Jeff, FWIW, you can read my analysis of what is going wrong at http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php bottom line, i agree this is a slurm issue (slurm plugin should depend on libslurm, but they do not, yet) a possible workaround would be to make the pmi component a

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Jeff Squyres (jsquyres)
Ok, if the problem is moot, great. (sidenote: this is moot, so ignore this if you want: with this explanation, I'm still not sure how RTLD_GLOBAL fixes the issue) On Dec 1, 2014, at 5:15 PM, Ralph Castain wrote: > Easy enough to explain. We link libpmi into the pmix/s1

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Ralph Castain
Easy enough to explain. We link libpmi into the pmix/s1 component. This library is missing the linkage to libslurm that contains the linkage to libauth where munge resides. So when we call a PMI function, libpmi references a call to munge for authentication and hits an “unresolved symbol”

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Jeff Squyres (jsquyres)
On Dec 1, 2014, at 5:07 PM, Ralph Castain wrote: > FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its > dependencies (the pmi-2 one is correct). Moe is aware of the problem and > fixing it on their side. This won’t help existing installations until

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Ralph Castain
FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its dependencies (the pmi-2 one is correct). Moe is aware of the problem and fixing it on their side. This won’t help existing installations until they upgrade, but I tend to agree with Jeff about not fixing other people’s

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Jeff Squyres (jsquyres)
On Dec 1, 2014, at 4:07 PM, Howard Pritchard wrote: > There has been some discussion of end case situations with use of dlopen > in the ompi mca framework that can lead to unresolved symbols when > subsequent shared libraries are dlopen'd that might needs symbols from > a

[OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Howard Pritchard
Hi ompi developers, If you always configure ompi with --disable-dlopen you can delete this message now. There has been some discussion of end case situations with use of dlopen in the ompi mca framework that can lead to unresolved symbols when subsequent shared libraries are dlopen'd that might