Re: [OMPI devel] Shared object dependencies

2018-06-08 Thread Gabriel, Edgar
I wanted to add one item before I forget (although I agree with what Jeff said): The error messages shown reminds me of the problem that we had with ompio in 1.8/1.10 series when the RTLD_GLOBAL option was not correctly set. However, that was fixed in the 2.0 series and going forward, so if

Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Gabriel, Edgar
> (among other abstraction violations) > > > > What about following up in github ? > > > > Cheers, > > > > Gilles > > > > On Tuesday, June 12, 2018, Gabriel, Edgar > wrote: > > So , I am still surprised to see this error message: if you look at let

Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Gabriel, Edgar
So , I am still surprised to see this error message: if you look at lets say just one error message (and all others are the same): > > [orc-login2:107400] mca_base_component_repository_open: unable to open > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so: > > undefined symbol:

Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Gabriel, Edgar
Well, I am still confused. What is different on nixOS vs. other linux distros that makes this error appear, and is it relevant enough for the backport or should we just go forward for 4.0? Is it again a RTLD_GLOBAL issue as it was back 2014? And last but not least, I raised on the github

[OMPI devel] Open MPI with psm2 on master

2018-06-17 Thread Gabriel, Edgar
I had recently problems running ompi master in our Omnipath cluster, 3.0 and 3.1 work however without problems. After some digging, I found that I have to set the environment variable PSM2_MULTI_EP for master to work at all for us. Not sure whether this is intended or an inadvertent

Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-19 Thread Gabriel, Edgar
I performed some tests on our Omnipath cluster, and I have a mixed bag of results with 4.0.0rc1 1. Good news, the problems with the psm2 mtl that I reported in June/July seem to be fixed. I still get however a warning every time I run a job with 4.0.0, e.g. compute-1-1.local.4351PSM2