[OMPI devel] RFC: how should Open MPI handle link-local addresses

2015-05-21 Thread Gilles Gouaillardet
Folks, this RFC is a follow-up of * issue 585 https://github.com/open-mpi/ompi/issues/585 * related PR 591 https://github.com/open-mpi/ompi/pull/591 As some of you might have already noticed, Open MPI fails if configure'd with --enable-ipv6 and ipv6 interfaces are found on the system. The r

Re: [OMPI devel] Open MPI collectives algorithm selection

2015-05-20 Thread Gilles Gouaillardet
fies why in your example the 2 proc communicators are > using the rule for 4. > > Using 0 as index for an algorithm selection redirect the decision to the > default, hard-coded, coll_tuned decision function, allowing the dynamic > rules to fall back to the predefined behavior. > &g

Re: [OMPI devel] Open MPI collectives algorithm selection

2015-05-20 Thread Gilles Gouaillardet
ile for the communicator in question. Howard 2015-05-19 20:05 GMT-06:00 Gilles Gouaillardet mailto:gil...@rist.or.jp>>: Folks, this is a follow-up of a discussion on the user ML started at http://www.open-mpi.org/community/lists/users/2015/05/26882.php

Re: [OMPI devel] Open MPI collectives algorithm selection

2015-05-20 Thread Gilles Gouaillardet
ching size in the rule file for the communicator in question. Howard 2015-05-19 20:05 GMT-06:00 Gilles Gouaillardet <mailto:gil...@rist.or.jp>>: Folks, this is a follow-up of a discussion on the user ML started at http://www.open-mpi.org/community/lists/users/2015/05/26

[OMPI devel] Open MPI collectives algorithm selection

2015-05-19 Thread Gilles Gouaillardet
Folks, this is a follow-up of a discussion on the user ML started at http://www.open-mpi.org/community/lists/users/2015/05/26882.php 1) it turns out the dynamic rule filename must be "sorted" : - rules must be sorted by communicator size - within a given communicator size, rules must be sorted

Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-18 Thread Gilles Gouaillardet
Hi Mark, ideally, we would like to use a single repository with the following constraints : - all Open MPI developers can commit to the master - only Release Manager and Gatekeepers can commit to the release branch (v1.8, ...) unfortunatly, github does not (yet ?) implement per branch access

Re: [OMPI devel] c_accumulate

2015-04-20 Thread Gilles Gouaillardet
ilure similar to this ticket? https://github.com/open-mpi/ompi/issues/393 Rolf *From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles Gouaillardet *Sent:* Monday, April 20, 2015 9:12 AM *To:* Open MPI Developers *Subject:* [OMPI devel] c_accumulate Folks, i (sometimes) get some failur

Re: [OMPI devel] c_accumulate

2015-04-20 Thread Gilles Gouaillardet
milar to this ticket? https://github.com/open-mpi/ompi/issues/393 Rolf *From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles Gouaillardet *Sent:* Monday, April 20, 2015 9:12 AM *To:* Open MPI Developers *Subject:* [OMPI devel] c_accumulate Folks, i (sometimes) get some failur

Re: [OMPI devel] c_accumulate

2015-04-20 Thread Gilles Gouaillardet
github.com/open-mpi/ompi/issues/393 Rolf *From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles Gouaillardet *Sent:* Monday, April 20, 2015 9:12 AM *To:* Open MPI Developers *Subject:* [OMPI devel] c_accumulate Folks, i (sometimes) get some failure with the c_accumulate test fr

[OMPI devel] c_accumulate

2015-04-20 Thread Gilles Gouaillardet
Folks, i (sometimes) get some failure with the c_accumulate test from the ibm test suite on one host with 4 mpi tasks so far, i was only able to observe this on linux/sparc with the vader btl here is a snippet of the test : MPI_Win_create(&RecvBuff, sizeOfInt, 1, MPI_INFO_NULL,

Re: [OMPI devel] running Open MPI with different install paths

2015-04-17 Thread Gilles Gouaillardet
emons, so we can't > self-discover it. > > > On Fri, Apr 17, 2015 at 2:32 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> i am trying to run heterogeneous Open MPI. >> all my nodes use NFS everything is shared, so i need to manually specify >>

[OMPI devel] running Open MPI with different install paths

2015-04-17 Thread Gilles Gouaillardet
Folks, i am trying to run heterogeneous Open MPI. all my nodes use NFS everything is shared, so i need to manually specify x86_64 nodes must use /.../ompi-x86_64 and sparcv9 nodes must use /.../ompi-sparcv9 is there a simple way to achieve this ? Cheers, Gilles

Re: [OMPI devel] Common symbols warning

2015-04-15 Thread Gilles Gouaillardet
Dave, my bad, the error is ignored as it should. i will then close the related PR since it is now irrelevant Cheers, Gilles On 4/16/2015 12:30 AM, Dave Goodell (dgoodell) wrote: On Apr 14, 2015, at 11:02 PM, Gilles Gouaillardet wrote: Dave, my understanding is that the presence of common

Re: [OMPI devel] Common symbols warning

2015-04-15 Thread Gilles Gouaillardet
Dave, my understanding is that the presence of common symbols should be treated as a warning (and hence make install should not fail) makes sense ? Cheers, Gilles On 4/15/2015 12:14 PM, Ralph Castain wrote: Dave committed this earlier today, and here is the first error report: WARNING! Co

Re: [OMPI devel] Unable to execute development version

2015-03-27 Thread Gilles Gouaillardet
Federico, can you try mpirun -mca sec basic ... cheers, Gilles On Saturday, March 28, 2015, Federico Reghenzani < federico1.reghenz...@mail.polimi.it> wrote: > Hello all. > I'm working together with Gianmario to Barbeque > -OpenMPI interface. I downloaded the last >

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Gilles Gouaillardet
Nathan, Fujitsu MPI is openmpi based and is running on their sparcv9 like proc. Cheers, Gilles On Friday, March 27, 2015, Nathan Hjelm wrote: > > As a follow-on. How many of our supported architectures should we > continue to support. The current supported list is: > > alpha > amd64* > arm* >

Re: [OMPI devel] [OMPI users] Configuration error with external hwloc

2015-03-23 Thread Gilles Gouaillardet
Peter, i was able to reproduce the issue when the external hwloc libraries are not in the default lib path (e.g. /usr/lib64) a simple workaround is to LD_LIBRARY_PATH=/path_to_your_hwloc_lib configure ... /* libevent configure does compile a test program with -L/path_to_your_hwloc_lib -lhwloc, a

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1250-g9107bf5

2015-03-09 Thread Gilles Gouaillardet
, so there is still an issue here /* e.g. cannot assign new_comm->c_topo, nor invoke ompi_comm_free(&new_comm) */ i will think of a correct fix from now, and in the mean time, i will welcome your advises :-) Cheers, Gilles On 2015/03/10 10:08, Gilles Gouaillardet wrote: >

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1250-g9107bf5

2015-03-09 Thread Gilles Gouaillardet
t are new to this repository have > > not appeared on any other notification email; so we list those > > revisions in full, below. > > > > - Log - > > > https://github.com/open-mpi/ompi/commit/9107bf50776d54

[OMPI devel] nightly tarballs

2015-03-03 Thread Gilles Gouaillardet
Folks, the latest tarballs for both master and v1.8 were generated on Feb 28 2015. among other things, that means the latest coverity report is from Feb 28. is there something wrong ? (and is someone taking care of it ?) Cheers, Gilles

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1046-g004160f

2015-02-26 Thread Gilles Gouaillardet
hub.com/open-mpi/ompi/commit/004160f8da97be1f29aefeaaa51cf52298e0d3a4 >> >> commit 004160f8da97be1f29aefeaaa51cf52298e0d3a4 >> Author: Gilles Gouaillardet >> Date: Mon Feb 23 13:45:23 2015 +0900 >> >>coll/tuned: silence CID 1269934 >> >> diff --git

Re: [OMPI devel] Fortran issue

2015-02-20 Thread Gilles Gouaillardet
George, this is correctly handled in ompi_testany_f : /* Increment index by one for fortran conventions. Note that all Fortran compilers have FALSE==0; we just need to check for any nonzero value (because TRUE is not always 1) */ Cheers, Gilles On 2015/02/20 1:15

[OMPI devel] git commit id in coverity

2015-02-16 Thread Gilles Gouaillardet
Folks, from the coverity web interface, i could not find the git commit id of open mpi that was last analyzed (all i could find is the sate of the last analyze) is this information available, and where ? Cheers, Gilles

Re: [OMPI devel] OBJ_RELEASE() question

2015-02-12 Thread Gilles Gouaillardet
not free'd >>> but set to NULL. If I call it again the buffer is NULL and the original >>> buffer will not be free'd. Setting the buffer to NULL seems unnecessary. >>> >>> I have not seen this as a problem in the code I was just tryi

Re: [OMPI devel] OBJ_RELEASE() question

2015-02-12 Thread Gilles Gouaillardet
Adrian, opal_obj_update does not fail or success, it returns the new obj_reference_count. can you point to one specific location in the code where you think it is wrong ? OBJ_RELEASE(buffer) buffer = NULL; could be written as if (((opal_object_t *)buffer)->obj_reference_count == 1) { OBJ_

Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread Gilles Gouaillardet
Dave, These settings tell ompi to use native infiniband on the ib qdr port and tcpo/ip on the other port. From the faq, roce is implemented in the openib btl http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce Did you use --mca btl_openib_cpc_include rdmacm in your first tests ?

Re: [OMPI devel] OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread Gilles Gouaillardet
lca wrote: >On Fri, Feb 6, 2015 at 8:54 AM, Gilles Gouaillardet > wrote: > >George, > >Can you point me to an other project that uses 128 bits atomics ? > > >http://icl.cs.utk.edu/parsec/. It heavily uses lock-free structures, and the >128 bits atomics are the

Re: [OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread Gilles Gouaillardet
line 61)? > > >On Wed, Feb 4, 2015 at 11:30 PM, Gilles Gouaillardet > wrote: > >Paul and all, > >i just pushed >https://github.com/open-mpi/ompi/commit/b42e3441294e9fe787fe8e9ad7403d5b8e465163 > >when a buggy compiler is detected, configure now forces OPAL_HAVE

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-04 Thread Gilles Gouaillardet
, Gilles Gouaillardet wrote: > Paul, > > my previous email was misleading. > > what i really meant is the opal_fifo test works fine with icc 2013u5 > (the release before 2013sp1) and > icc 2013sp1u2 and later > > so even if the reproducer fails with icc older that 2013sp1u2, t

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-04 Thread Gilles Gouaillardet
e == b.value' failed. > Aborted > @ Testing Intel compiler version 14.0.0.080 > a.out: conftest.c:36: main: Assertion `a.value == b.value' failed. > Aborted > @ Testing Intel compiler version 14.0.1.106 > a.out: conftest.c:36: main: Assertion `a.value == b.value' fa

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-04 Thread Gilles Gouaillardet
Nathan, imho, this is a compiler bug and only two versions are affected : - intel icc 14.0.0.080 (aka 2013sp1) - intel icc 14.0.1.106 (aka 2013sp1u1) /* note the bug only occurs with -O1 and higher optimization levels */ here is attached a simple reproducer a simple workaround is to configure wi

Re: [OMPI devel] Master hangs in opal_LIFO test

2015-02-03 Thread Gilles Gouaillardet
Paul, George and i were able to reproduce this issue with icc 14.0 but not with icc 14.3 and later i am trying to see how the difference/bug could be automatically handled Cheers, Gilles On 2015/02/03 16:18, Paul Hargrove wrote: > CORRECTION: > > It is the opal_lifo (not fifo) test which hung

Re: [OMPI devel] Great meeting!

2015-01-30 Thread Gilles Gouaillardet
Hi Jeff, let me update the --with-threads configure option. it has been removed from the master : commit 7a55d49ca78bcc157749c04027515f12b026ec33 Author: Gilles Gouaillardet List-Post: devel@lists.open-mpi.org Date: Tue Oct 21 19:13:19 2014 +0900 configury: per RFC, remove the --with

Re: [OMPI devel] One sided tests

2015-01-21 Thread Gilles Gouaillardet
George, a tentative fix is available at https://github.com/open-mpi/ompi/pull/355 i asked Nathan to review it before it lands into the master Cheers, Gilles On 2015/01/22 7:08, George Bosilca wrote: > Current trunk compiled with any compiler (gcc or icc) fails the one sided > tests from mpi_te

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread Gilles Gouaillardet
the stride is 0, this datatype a memory >> layout that includes 2 times the same int. I'm not sure this was indeed >> intended... >> >> George. >> >> >> On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet >> wrote: >> Adrian,

Re: [OMPI devel] Failures

2015-01-19 Thread Gilles Gouaillardet
compiler version Cheers, Gilles On 2015/01/17 0:19, George Bosilca wrote: > Your patch solve the issue with opal_tree. The opal_lifo remains broken. > > George. > > > On Fri, Jan 16, 2015 at 5:12 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: &

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
e. please not Jeff recently pushed a patch related to that and this message might be a false positive. Cheers, Gilles On 2015/01/19 14:17, Gilles Gouaillardet wrote: > Adrian, > > i just fixed this in the master > (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bf

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
Adrian, i just fixed this in the master (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2) the root cause is a corner case was not handled correctly : MPI_Type_hvector(2, 1, 0, MPI_INT, &type); type has extent = 4 *but* size = 8 ob1 used to test only the

Re: [OMPI devel] pthreads (was: Re: RFC: remove --disable-smp-locks)

2015-01-16 Thread Gilles Gouaillardet
Folks, i pushed two commits in order to remove the --with-threads option and the dead code : commit 7a55d49ca78bcc157749c04027515f12b026ec33 Author: Gilles Gouaillardet List-Post: devel@lists.open-mpi.org Date: Tue Oct 21 19:13:19 2014 +0900 configury: per RFC, remove the --with-threads

Re: [OMPI devel] Failures

2015-01-16 Thread Gilles Gouaillardet
George, i pushed https://github.com/open-mpi/ompi/commit/ac16970d21d21f529f1ec01ebe0520843227475b in order to get the intel compiler work with ompi Cheers, Gilles On 2015/01/16 17:29, Gilles Gouaillardet wrote: > George, > > i was unable to reproduce the hang with icc 14.0.3.174 and g

Re: [OMPI devel] Failures

2015-01-16 Thread Gilles Gouaillardet
George, i was unable to reproduce the hang with icc 14.0.3.174 and greater on a RHEL6 like distro. i was able to reproduce the opal_tree failure and found two possible workarounds : a) manually compile opal/class/opal_tree.lo *without* the -finline-functions flag b) update deserialize_add_tree_it

Re: [OMPI devel] pthreads (was: Re: RFC: remove --disable-smp-locks)

2015-01-07 Thread Gilles Gouaillardet
2015, at 4:25 AM, Gilles Gouaillardet > wrote: > >> Talking about thread support ... >> >> i made a RFC several monthes ago in order to remove the >> --with-threads option from configure >> >> /* ompi requires pthreads, no more, no less */ > >Did we

Re: [OMPI devel] RFC: remove --disable-smp-locks

2015-01-07 Thread Gilles Gouaillardet
Talking about thread support ... i made a RFC several monthes ago in order to remove the --with-threads option from configure /* ompi requires pthreads, no more, no less */ it was accepted, but i could not find the time to implement it ... basically, i can see three steps : 1) remove the --wit

Re: [OMPI devel] ompi-master build error : make can require autotools

2015-01-06 Thread Gilles Gouaillardet
e recent automake version (but which one ?) Cheers, Gilles On 2015/01/07 2:02, Dave Goodell (dgoodell) wrote: > On Jan 5, 2015, at 8:40 PM, Gilles Gouaillardet > wrote: > >> Dave, >> >> what if you do >> >> touch ompi/include/mpi.h.in && sleep 1 &

Re: [OMPI devel] ompi-master build error : make can require autotools

2015-01-05 Thread Gilles Gouaillardet
timestamp on opal_config_ptrheads.m4 was the only source state difference > between the two runs. So I don't know what is causing your problem, but it's > not the rule you pointed out in the generated makefile. Perhaps you are > building on NFS and this is causing you som

Re: [OMPI devel] problem running jobs on ompi-master

2014-12-26 Thread Gilles Gouaillardet
Edgar, First, make sure your master includes https://github.com/open-mpi/ompi/commit/05af80b3025dbb95bdd4280087450791291d7219 If this is not enough, try with --mca coll ^ml Hope this helps Gilles. Edgar Gabriel さんのメール: >I have some problems running jobs with ompi-master on one of our >clus

Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-612-g05af80b

2014-12-24 Thread Gilles Gouaillardet
Ralph, I had second thougts on what i wrote earlier, and i think the code is correct. e.g. reply cannot be used uninitialized. That being said, i think reply should be initialized to null and OBJ_RELEASE'd if not null on exit in order to avoid a memory leak. Sorry for the confusion, Gilles

[OMPI devel] mpirun hang (regression in bffb2b7a4bb49c9188d942201b8a8f04872ff63c)

2014-12-24 Thread Gilles Gouaillardet
Ralph, i tried to debug the issue reported by Siegmar at http://www.open-mpi.org/community/lists/users/2014/12/26052.php i have not been able to try this on an heterogeneous cluster yet, but i could reproduce a hang with 2 nodes and 3 tasks : mpirun -host node0,node1 -np 3 --mca btl tcp,self --m

Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-12-23 Thread Gilles Gouaillardet
Lisandro, i fixed this in the master and made a PR for v1.8. this is a one liner, and you can find it at https://github.com/ggouaillardet/ompi-release/commit/0e478c1191715fff37e4deb56f8891774db62775 Cheers, Gilles On 2014/12/23 23:43, Lisandro Dalcin wrote: > On 28 September 2014 at 19:13, Geo

Re: [OMPI devel] ompi-master build error : make can require autotools

2014-12-22 Thread Gilles Gouaillardet
e affected) Cheers, Gilles On Tue, Dec 23, 2014 at 2:26 AM, Dave Goodell (dgoodell) wrote: > On Dec 22, 2014, at 2:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > Jeff and all, > > > > i just found "by accident" that make can

[OMPI devel] ompi master, libfabric and static libraries

2014-12-22 Thread Gilles Gouaillardet
Jeff, MTT reported some errors when building some test suites : http://mtt.open-mpi.org/index.php?do_redir=2219 the root cause was some missing flags in the wrappers. i fixed that in 8976dcf6101412f6bd0080764d19a3e9d4edf570 there is now a second issue : libfabric requires libnl, but the -lnl fla

[OMPI devel] ompi-master build error : make can require autotools

2014-12-22 Thread Gilles Gouaillardet
Jeff and all, i just found "by accident" that make can require autotools. for example: from (generated) ompi/include/Makefile : $(srcdir)/mpi.h.in: $(am__configure_deps) ($(am__cd) $(top_srcdir) && $(AUTOHEADER)) rm -f stamp-h2 touch $@ and $(am__configure_deps) is a bu

Re: [OMPI devel] [1.8.4rc5] preliminary results

2014-12-19 Thread Gilles Gouaillardet
Paul, i faced the very same issue with open64-5.0 here is attached a simple reproducer. main2 can be built, but main cannot be built. the only difference is than unlike main.F90, main2.F90 contains a line : use, intrinsic :: iso_c_binding /* and they both link with libfoo.so, and foo.F90 *does*

Re: [OMPI devel] libfabric, config.h and hwloc

2014-12-19 Thread Gilles Gouaillardet
no config.h in my include path, and hence make fails. Cheers, Gilles On 2014/12/19 4:12, Jeff Squyres (jsquyres) wrote: > On Dec 18, 2014, at 3:13 AM, Gilles Gouaillardet > wrote: > >> currently, ompi master cannot be built if configured with >> --without-hwloc *a

[OMPI devel] libfabric, config.h and hwloc

2014-12-18 Thread Gilles Gouaillardet
Jeff, currently, ompi master cannot be built if configured with --without-hwloc *and without* --without-libfabric. the root cause is HAVE_CONFIG_H is defined but no config.h file is found. i digged a bit and found that config.h is taken from a hwloc directory (if the --without-hwloc option is no

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
p.ibp0 >> >> Routing Table: IPv6 >> Destination/MaskGateway Flags Ref Use >> If >> --- --- - --- --- >> - >> ::1 ::1

Re: [OMPI devel] OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
i' >make[1]: *** [all-recursive] Error 1 >make[1]: Leaving directory >`/home/mpiteam/openmpi/nightly-tarball-build-root/v1.8/ompi-2014-12-16-211833/ompi/openmpi-v1.8.3-305-ge3ae27d/_build' >make: *** [distcheck] Error 1 >===

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
2-g4e4f997 >> 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags >> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags >> >> I hope to be able to login and collect the results around noon pacific time >> on Wed. >> >> -Paul &

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
on Wed. > > -Paul > > On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: >> Paul, >> >> i understand, i will now work on a better way to figure out the required >> flags >> >> the latest nightly s

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
; was. > > I can at least test 1 tarball with one set of configure args each evening. > Anything more than that I cannot commit to. > > My scripts are capable of grabbing the v1.8 nightly instead of the rc if > that helps. > > -Paul > > On Tue, Dec 16, 2014 at 10:31 PM

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph, i think that will not work. here is the full story : once upon a time, on solaris, we did not try to compile pthread'ed app without any special parameters. that was a minor annoyance on solaris 10 with old gcc : configure passed a flag (-pthread if i remember correctly) that was not suppo

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
uot; --with-wrapper-cflags="-m64 -mt" \ >LDFLAGS="-mt" --with-wrapper-ldflags="-mt" > > if I am to be sure that orterun and the app are both compiled and linked > with "-mt". > Is that right? > > -Paul > > On Tue, Dec 16, 20

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
K connecting to pcp-j-19 > OK connecting to pcp-j-20 > OK connecting to 172.16.0.119 > OK connecting to 172.16.0.120 > OK connecting to 172.18.0.119 > OK connecting to 172.18.0.120 > > > I will report on the 1.8.3 and the non-m64 runs when they are done. > Meanwhile, if y

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
2.18.0.120         U         3         26 p.ibp0  > > >Routing Table: IPv6 > >  Destination/Mask            Gateway                   Flags Ref   Use    If   >  > >--- --- - --- --- >-  > >::1                         ::1                         UH      2       0 lo0   >  >

Re: [OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
s necessary >> because libC and libCrun need libthread for a mul- >> tithreaded application. >> >> If you compile and link in separate steps and you com- >> pile with -mt, you might get unexpected results. If you >>

Re: [OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
cause libC and libCrun need libthread for a mul- >> tithreaded application. >> >> If you compile and link in separate steps and you com- >> pile with -mt, you might get unexpected results. If you >> compile one translation unit w

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
that "-mt" *also* passes -D_REENTRANT to the > preprocessor. > > -Paul > > On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: >> Paul, >> >> could you please make sure configure added "-D_REENTR

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
Paul, could you please make sure configure added "-D_REENTRANT" to the CFLAGS ? /* otherwise, errno is a global variable instead of a per thread variable, which can explains some weird behaviour. note this should have been already fixed */ assuming -D_REENTRANT is set, could you please give the

Re: [OMPI devel] OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-12 Thread Gilles Gouaillardet
Ralph, I cannot find a case for the %u format is guess_strlen And since the default does not invoke va_arg() I it seems strlen is invoked on nnuma instead of arch Makes sense ? Cheers, Gilles Ralph Castain wrote: >Afraid I’m drawing a blank, Paul - I can’t see how we got to a bad address >do

Re: [OMPI devel] OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Gilles Gouaillardet
tion wave” that is causing bug reports. Once we get >thru this, I expect things will settle down again. > >I know Jeff is hosed, and I’m likewise next week. Can someone create a PR to >update 1.8 with these patches? > > >> On Dec 12, 2014, at 12:32 AM, Brice Goglin wr

Re: [OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Gilles Gouaillardet
wloc/v1.8 should go to OMPI/master. > > And most of it should go to v1.8 too, but that may require some > backporting rework. I can update hwloc/v1.7 if that helps. > > Brice > > > > Le 12/12/2014 03:10, Gilles Gouaillardet a écrit : >> Brice, >> >> shou

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-12 Thread Gilles Gouaillardet
ace, Gilles, else we will fail > to connect. > >> On Dec 11, 2014, at 8:26 PM, Gilles Gouaillardet >> wrote: >> >> Paul, >> >> about the five warnings : >> can you confirm you are running mpirun *not* on n15 nor n16 ? >> if my guess is correct, the

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-11 Thread Gilles Gouaillardet
Paul, about the five warnings : can you confirm you are running mpirun *not* on n15 nor n16 ? if my guess is correct, then you can get up to 5 warnings : mpirun + 2 orted + 2 mpi tasks do you have any oob_tcp_if_include or oob_tcp_if_exclude settings in your openmpi-mca-params.conf ? here is att

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread Gilles Gouaillardet
George, please allow me to jump in with naive comments ... currently (master) both openib and usnic btl invokes opal_using_threads in component_init() : btl_openib_component_init(int *num_btl_modules, bool enable_progress_threads, bool enable_m

Re: [OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-11 Thread Gilles Gouaillardet
Brice, should this fix be backported to both master and v1.8 ? Cheers, Gilles On 2014/12/12 7:46, Brice Goglin wrote: > This problem was fixed in hwloc upstream recently. > > https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e > Brice > > > Le 11/12/2014 23:40,

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
gainst the Ubuntu system version of hwloc, or the > message must be coming from Slurm. > > >> On Dec 10, 2014, at 6:14 PM, Gilles Gouaillardet >> wrote: >> >> Pim, >> >> at this stage, all i can do is acknowledge your slurm is configured to use >

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
he CPU cores assigned are 0 and 1 whereas they are different for the later > started jobs. I attached the output (including lstopo ---of xml output > (called for each task)) for both the working and broken case again. > > Kind regards, > > Pim Schellart > > > > >>

Re: [OMPI devel] OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
changed from XOOB to UDCM. > >Piotr > > >De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet >[gilles.gouaillar...@iferc.org] >Envoyé : mercredi 10 décembre 2014 09:20 >À : Open MPI Developers >Objet : Re:

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Piotr and all, i issued PR #313 (vs master) based on your patch: https://github.com/open-mpi/ompi/pull/313 could you please have a look at it ? Cheers, Gilles On 2014/12/09 22:07, Gilles Gouaillardet wrote: > Thanks Piotr, > > Based on the ompi community rules, a pr should be ma

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-09 Thread Gilles Gouaillardet
__ >De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet >[gilles.gouaillar...@iferc.org] >Envoyé : lundi 8 décembre 2014 03:27 >À : Open MPI Developers >Objet : Re: [OMPI devel] openmpi and XRC API from ofed-3.12 > >Hi Piotr, > >this is q

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Gilles Gouaillardet
Pim, if you configure OpenMPI with --with-hwloc=external (or something like --with-hwloc=/usr) it is very likely OpenMPI will use the same hwloc library (e.g. the "system" library) that is used by SLURM /* i do not know how Ubuntu packages OpenMPI ... */ The default (e.g. no --with-hwloc parame

Re: [OMPI devel] [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Gilles Gouaillardet
Folks, FWIW, i observe a similar behaviour on my system. imho, the root cause is OFED has been upgraded from a (quite) older version to latest 3.12 version here is the relevant part of code (btl_openib.c from the master) : static uint64_t calculate_max_reg (void) { if (0 == stat("/sys/modu

Re: [OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-07 Thread Gilles Gouaillardet
Hi Piotr, this is quite an old thread now, but i did not see any support for XRC with ofed 3.12 yet (nor in trunk nor in v1.8) my understanding is that Bull already did something similar for the v1.6 series, so let me put this the other way around : does Bull have any plan to contribute this wo

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Gilles Gouaillardet
), and that >> involves a lot of change. >> >> I'll instead try to provide a faster error response so it is clearer what >> is happening, hopefully letting the user fix the problem by turning on the >> loopback interface. >> >> >> On Nov 25,

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Gilles Gouaillardet
resolves the pmi linkage problem. > > >> On Dec 1, 2014, at 8:09 PM, Gilles Gouaillardet >> wrote: >> >> $ srun --version >> slurm 2.6.6-VENDOR_PROVIDED >> >> $ srun --mpi=pmi2 -n 1 ~/hw >> I am 0 / 1 >> >> $ srun -n 1 ~/hw >>

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
gt; >> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet >> wrote: >> >> I d like to make a step back ... >> >> i previously tested with slurm 2.6.0, and it complained about the >> slurm_verbose symbol that is defined in libslurm.so >> so with slur

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
e that requires it, and it won't hurt > anything to do so. > > >> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet >> wrote: >> >> Jeff, >> >> FWIW, you can read my analysis of what is going wrong at >> http://www.open-mpi.org/community/lists/pmix

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
Jeff, FWIW, you can read my analysis of what is going wrong at http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php bottom line, i agree this is a slurm issue (slurm plugin should depend on libslurm, but they do not, yet) a possible workaround would be to make the pmi component a

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-12-01 Thread Gilles Gouaillardet
what i am interested in :-) >> could you be more specific (e.g. point me to the functions, since the >> OpenLava doc is pretty minimal ...) >> >> the goal here is to spawn the orted daemons as part of the parallel job, >> so these daemons are accounted within t

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
MPI_PARAM_CHECK"]=" 1" i will try on a centos7 box from now. in the mean time, can you check you config.status and try again with mpirun --mca mpi_param_check true Cheers, Gilles On 2014/11/27 10:06, Gilles Gouaillardet wrote: > I will double check this(afk right now) > A

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
gt; >This was indeed with a debug build. I wouldn’t expect a segfault even with an >optimized build, though - I would expect an MPI error, yes? > > > > >On Nov 26, 2014, at 4:26 PM, Gilles Gouaillardet > wrote: > > >I will have a look > >Btw, i was runn

Re: [OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
014, at 8:46 AM, Ralph Castain wrote: > > >Hmmm….yeah, I know we saw this and resolved it in the trunk, but it looks like >the fix indeed failed to come over to 1.8. I’ll take a gander (pretty sure I >remember how I fixed it) - thanks! > >On Nov 26, 2014, at 12:03 AM, Gilles Gouail

[OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
Ralph, i noted several hangs in mtt with the v1.8 branch. a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test from the intel_tests suite, invoke mpirun on one node and run the taks on an other node : node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Gilles Gouaillardet
Ralph and Paul, On 2014/11/26 10:37, Ralph Castain wrote: > So it looks like the issue isn't so much with our code as it is with the OS > stack, yes? We aren't requiring that the loopback be "up", but the stack is > in order to establish the connection, even when we are trying a non-lo > interf

Re: [OMPI devel] [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-21 Thread Gilles Gouaillardet
Hi Ghislain, that sound like a but in MPI_Dist_graph_create :-( you can use MPI_Dist_graph_create_adjacent instead : MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD, degrees, &targets[0], &weights[0], degrees, &targets[0], &weights[0], info, rankReordering, &commGraph); it

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
y > Uppsala University, Sweden > marc.hoepp...@bils.se > >> On 18 Nov 2014, at 08:40, Gilles Gouaillardet >> wrote: >> >> Hi Marc, >> >> OpenLava is based on a pretty old version of LSF (4.x if i remember >> correctly) >> and i do not think L

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
Hi Marc, OpenLava is based on a pretty old version of LSF (4.x if i remember correctly) and i do not think LSF had support for parallel jobs tight integration at that time. my understanding is that basically, there is two kind of direct integration : - mpirun launch: mpirun spawns orted via the A

Re: [OMPI devel] Error in version 1.8.3?!

2014-11-13 Thread Gilles Gouaillardet
Harmut, this is a known bug. in the mean time, can you give a try to 1.8.4rc1 ? http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4rc1.tar.gz /* if i remember correctly, this is fixed already in the rc1 */ Cheers, Gilles On 2014/11/13 19:48, Hartmut Häfner (SCC) wrote: > Dear d

[OMPI devel] oshmem: put does not work with btl/vader if knem is enabled

2014-11-12 Thread Gilles Gouaillardet
Folks, I found (at least) two issues with oshmem put if btl/vader is used with knem enabled : $ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction -- SHMEM_ABORT was invoked on rank 0 (pid 11936, host=soleil) with error

Re: [OMPI devel] OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
Thanks Mike, BTW what is the distro running on your test cluster ? Mike Dubman wrote: >ok, I disabled vader tests in SHMEM and it passes. > >it can be requested from jenkins by specifying "vader" in PR comment line. > > >On Tue, Nov 11, 2014 at 11:04 AM, Gilles Go

<    1   2   3   4   5   6   7   8   9   >