Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-04 Thread Gilles Gouaillardet
Nathan, imho, this is a compiler bug and only two versions are affected : - intel icc 14.0.0.080 (aka 2013sp1) - intel icc 14.0.1.106 (aka 2013sp1u1) /* note the bug only occurs with -O1 and higher optimization levels */ here is attached a simple reproducer a simple workaround is to configure

Re: [OMPI devel] Master hangs in opal_LIFO test

2015-02-03 Thread Gilles Gouaillardet
Paul, George and i were able to reproduce this issue with icc 14.0 but not with icc 14.3 and later i am trying to see how the difference/bug could be automatically handled Cheers, Gilles On 2015/02/03 16:18, Paul Hargrove wrote: > CORRECTION: > > It is the opal_lifo (not fifo) test which hung

Re: [OMPI devel] Great meeting!

2015-01-30 Thread Gilles Gouaillardet
Hi Jeff, let me update the --with-threads configure option. it has been removed from the master : commit 7a55d49ca78bcc157749c04027515f12b026ec33 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Tue Oct 21 19:13:19 2014 +0900 con

Re: [OMPI devel] One sided tests

2015-01-21 Thread Gilles Gouaillardet
George, a tentative fix is available at https://github.com/open-mpi/ompi/pull/355 i asked Nathan to review it before it lands into the master Cheers, Gilles On 2015/01/22 7:08, George Bosilca wrote: > Current trunk compiled with any compiler (gcc or icc) fails the one sided > tests from

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread Gilles Gouaillardet
tatype. Because the stride is 0, this datatype a memory >> layout that includes 2 times the same int. I'm not sure this was indeed >> intended... >> >> George. >> >> >> On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet >> <gilles.gouail

Re: [OMPI devel] Failures

2015-01-19 Thread Gilles Gouaillardet
compiler version Cheers, Gilles On 2015/01/17 0:19, George Bosilca wrote: > Your patch solve the issue with opal_tree. The opal_lifo remains broken. > > George. > > > On Fri, Jan 16, 2015 at 5:12 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: &

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
e. please not Jeff recently pushed a patch related to that and this message might be a false positive. Cheers, Gilles On 2015/01/19 14:17, Gilles Gouaillardet wrote: > Adrian, > > i just fixed this in the master > (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
Adrian, i just fixed this in the master (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2) the root cause is a corner case was not handled correctly : MPI_Type_hvector(2, 1, 0, MPI_INT, ); type has extent = 4 *but* size = 8 ob1 used to test only the

Re: [OMPI devel] pthreads (was: Re: RFC: remove --disable-smp-locks)

2015-01-16 Thread Gilles Gouaillardet
Folks, i pushed two commits in order to remove the --with-threads option and the dead code : commit 7a55d49ca78bcc157749c04027515f12b026ec33 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Tue Oct 21 19:13:19 2014 +0900 configur

Re: [OMPI devel] Failures

2015-01-16 Thread Gilles Gouaillardet
George, i pushed https://github.com/open-mpi/ompi/commit/ac16970d21d21f529f1ec01ebe0520843227475b in order to get the intel compiler work with ompi Cheers, Gilles On 2015/01/16 17:29, Gilles Gouaillardet wrote: > George, > > i was unable to reproduce the hang with icc 14.0.3.174 an

Re: [OMPI devel] Failures

2015-01-16 Thread Gilles Gouaillardet
George, i was unable to reproduce the hang with icc 14.0.3.174 and greater on a RHEL6 like distro. i was able to reproduce the opal_tree failure and found two possible workarounds : a) manually compile opal/class/opal_tree.lo *without* the -finline-functions flag b) update

Re: [OMPI devel] pthreads (was: Re: RFC: remove --disable-smp-locks)

2015-01-07 Thread Gilles Gouaillardet
co.com>さんのメール: >On Jan 7, 2015, at 4:25 AM, Gilles Gouaillardet ><gilles.gouaillar...@iferc.org> wrote: > >> Talking about thread support ... >> >> i made a RFC several monthes ago in order to remove the >> --with-threads option from configure >> >&

Re: [OMPI devel] RFC: remove --disable-smp-locks

2015-01-07 Thread Gilles Gouaillardet
Talking about thread support ... i made a RFC several monthes ago in order to remove the --with-threads option from configure /* ompi requires pthreads, no more, no less */ it was accepted, but i could not find the time to implement it ... basically, i can see three steps : 1) remove the

Re: [OMPI devel] ompi-master build error : make can require autotools

2015-01-06 Thread Gilles Gouaillardet
e recent automake version (but which one ?) Cheers, Gilles On 2015/01/07 2:02, Dave Goodell (dgoodell) wrote: > On Jan 5, 2015, at 8:40 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Dave, >> >> what if you do >> >> touch

Re: [OMPI devel] problem running jobs on ompi-master

2014-12-26 Thread Gilles Gouaillardet
Edgar, First, make sure your master includes https://github.com/open-mpi/ompi/commit/05af80b3025dbb95bdd4280087450791291d7219 If this is not enough, try with --mca coll ^ml Hope this helps Gilles. Edgar Gabriel さんのメール: >I have some problems running jobs with ompi-master

Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-612-g05af80b

2014-12-24 Thread Gilles Gouaillardet
Ralph, I had second thougts on what i wrote earlier, and i think the code is correct. e.g. reply cannot be used uninitialized. That being said, i think reply should be initialized to null and OBJ_RELEASE'd if not null on exit in order to avoid a memory leak. Sorry for the confusion, Gilles

[OMPI devel] mpirun hang (regression in bffb2b7a4bb49c9188d942201b8a8f04872ff63c)

2014-12-24 Thread Gilles Gouaillardet
Ralph, i tried to debug the issue reported by Siegmar at http://www.open-mpi.org/community/lists/users/2014/12/26052.php i have not been able to try this on an heterogeneous cluster yet, but i could reproduce a hang with 2 nodes and 3 tasks : mpirun -host node0,node1 -np 3 --mca btl tcp,self

Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-12-23 Thread Gilles Gouaillardet
Lisandro, i fixed this in the master and made a PR for v1.8. this is a one liner, and you can find it at https://github.com/ggouaillardet/ompi-release/commit/0e478c1191715fff37e4deb56f8891774db62775 Cheers, Gilles On 2014/12/23 23:43, Lisandro Dalcin wrote: > On 28 September 2014 at 19:13,

Re: [OMPI devel] ompi-master build error : make can require autotools

2014-12-22 Thread Gilles Gouaillardet
e affected) Cheers, Gilles On Tue, Dec 23, 2014 at 2:26 AM, Dave Goodell (dgoodell) <dgood...@cisco.com > wrote: > On Dec 22, 2014, at 2:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > Jeff and all, > > > > i just fo

[OMPI devel] ompi master, libfabric and static libraries

2014-12-22 Thread Gilles Gouaillardet
Jeff, MTT reported some errors when building some test suites : http://mtt.open-mpi.org/index.php?do_redir=2219 the root cause was some missing flags in the wrappers. i fixed that in 8976dcf6101412f6bd0080764d19a3e9d4edf570 there is now a second issue : libfabric requires libnl, but the -lnl

[OMPI devel] ompi-master build error : make can require autotools

2014-12-22 Thread Gilles Gouaillardet
Jeff and all, i just found "by accident" that make can require autotools. for example: from (generated) ompi/include/Makefile : $(srcdir)/mpi.h.in: $(am__configure_deps) ($(am__cd) $(top_srcdir) && $(AUTOHEADER)) rm -f stamp-h2 touch $@ and $(am__configure_deps) is a

Re: [OMPI devel] libfabric, config.h and hwloc

2014-12-19 Thread Gilles Gouaillardet
and there is no config.h in my include path, and hence make fails. Cheers, Gilles On 2014/12/19 4:12, Jeff Squyres (jsquyres) wrote: > On Dec 18, 2014, at 3:13 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> currently, ompi master cannot be built if configu

[OMPI devel] libfabric, config.h and hwloc

2014-12-18 Thread Gilles Gouaillardet
Jeff, currently, ompi master cannot be built if configured with --without-hwloc *and without* --without-libfabric. the root cause is HAVE_CONFIG_H is defined but no config.h file is found. i digged a bit and found that config.h is taken from a hwloc directory (if the --without-hwloc option is

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
gt; Routing Table: IPv6 >> Destination/MaskGateway Flags Ref Use >> If >> --- --- - --- --- >> - >> ::1 ::1 UH 2

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
enmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags >> 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags >> >> I hope to be able to login and collect the results around noon pacific time >> on Wed. >> >> -Paul >> >&

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
ed. > > -Paul > > On Tue, Dec 16, 2014 at 10:48 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: >> Paul, >> >> i understand, i will now work on a better way to figure out the required >> flags >> >> the latest nightly snapsh

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph, i think that will not work. here is the full story : once upon a time, on solaris, we did not try to compile pthread'ed app without any special parameters. that was a minor annoyance on solaris 10 with old gcc : configure passed a flag (-pthread if i remember correctly) that was not

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
h-wrapper-cflags="-m64 -mt" \ >LDFLAGS="-mt" --with-wrapper-ldflags="-mt" > > if I am to be sure that orterun and the app are both compiled and linked > with "-mt". > Is that right? > > -Paul > > On Tue, Dec 16, 2014 at 6:25

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
connecting to pcp-j-19 > OK connecting to pcp-j-20 > OK connecting to 172.16.0.119 > OK connecting to 172.16.0.120 > OK connecting to 172.18.0.119 > OK connecting to 172.18.0.120 > > > I will report on the 1.8.3 and the non-m64 runs when they are done. > Meanwhile, if you ha

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
t; >  Destination/Mask            Gateway                   Flags Ref   Use    If   >  > >--- --- - --- --- >-  > >::1                         ::1                         UH      2       0 lo0   >  > >fe80::/10                   fe8

Re: [OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
ead. The -mt option is necessary >> because libC and libCrun need libthread for a mul- >> tithreaded application. >> >> If you compile and link in separate steps and you com- >> pile with -mt, you might get unexpected results. If

Re: [OMPI devel] 1.8.4rc Status

2014-12-16 Thread Gilles Gouaillardet
y >> because libC and libCrun need libthread for a mul- >> tithreaded application. >> >> If you compile and link in separate steps and you com- >> pile with -mt, you might get unexpected results. If you >> compile

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
ay that "-mt" *also* passes -D_REENTRANT to the > preprocessor. > > -Paul > > On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: >> Paul, >> >> could you please make sure configure added "-D_REENTR

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
Paul, could you please make sure configure added "-D_REENTRANT" to the CFLAGS ? /* otherwise, errno is a global variable instead of a per thread variable, which can explains some weird behaviour. note this should have been already fixed */ assuming -D_REENTRANT is set, could you please give the

Re: [OMPI devel] OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-12 Thread Gilles Gouaillardet
Ralph, I cannot find a case for the %u format is guess_strlen And since the default does not invoke va_arg() I it seems strlen is invoked on nnuma instead of arch Makes sense ? Cheers, Gilles Ralph Castain wrote: >Afraid I’m drawing a blank, Paul - I can’t see how we got

Re: [OMPI devel] OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Gilles Gouaillardet
12:32 AM, Brice Goglin <brice.gog...@inria.fr> wrote: >> >> Le 12/12/2014 07:36, Gilles Gouaillardet a écrit : >>> Brice, >>> >>> ompi master is based on hwloc 1.9.1, isn't it ? >> >> Yes sorry, I am often confused by all these OMPI

Re: [OMPI devel] [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.

2014-12-12 Thread Gilles Gouaillardet
should go to OMPI/master. > > And most of it should go to v1.8 too, but that may require some > backporting rework. I can update hwloc/v1.7 if that helps. > > Brice > > > > Le 12/12/2014 03:10, Gilles Gouaillardet a écrit : >> Brice, >> >> should this fi

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-12 Thread Gilles Gouaillardet
ace, Gilles, else we will fail > to connect. > >> On Dec 11, 2014, at 8:26 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Paul, >> >> about the five warnings : >> can you confirm you are running mpirun *not* on n15 nor

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-11 Thread Gilles Gouaillardet
Paul, about the five warnings : can you confirm you are running mpirun *not* on n15 nor n16 ? if my guess is correct, then you can get up to 5 warnings : mpirun + 2 orted + 2 mpi tasks do you have any oob_tcp_if_include or oob_tcp_if_exclude settings in your openmpi-mca-params.conf ? here is

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread Gilles Gouaillardet
George, please allow me to jump in with naive comments ... currently (master) both openib and usnic btl invokes opal_using_threads in component_init() : btl_openib_component_init(int *num_btl_modules, bool enable_progress_threads, bool

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
t the Ubuntu system version of hwloc, or the > message must be coming from Slurm. > > >> On Dec 10, 2014, at 6:14 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Pim, >> >> at this stage, all i can do is acknowledge your

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
; the CPU cores assigned are 0 and 1 whereas they are different for the later > started jobs. I attached the output (including lstopo ---of xml output > (called for each task)) for both the working and broken case again. > > Kind regards, > > Pim Schellart > > > > >>

Re: [OMPI devel] OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
l openib: connecting XRC queues has changed from XOOB to UDCM. > >Piotr > > >De : devel [devel-boun...@open-mpi.org] de la part de Gilles Gouaillardet >[gilles.gouaillar...@iferc.org] >Envoyé : mercredi 10 décembre 2014 09:20 >À : Op

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Piotr and all, i issued PR #313 (vs master) based on your patch: https://github.com/open-mpi/ompi/pull/313 could you please have a look at it ? Cheers, Gilles On 2014/12/09 22:07, Gilles Gouaillardet wrote: > Thanks Piotr, > > Based on the ompi community rules, a pr should b

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Gilles Gouaillardet
Pim, if you configure OpenMPI with --with-hwloc=external (or something like --with-hwloc=/usr) it is very likely OpenMPI will use the same hwloc library (e.g. the "system" library) that is used by SLURM /* i do not know how Ubuntu packages OpenMPI ... */ The default (e.g. no --with-hwloc

Re: [OMPI devel] [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Gilles Gouaillardet
Folks, FWIW, i observe a similar behaviour on my system. imho, the root cause is OFED has been upgraded from a (quite) older version to latest 3.12 version here is the relevant part of code (btl_openib.c from the master) : static uint64_t calculate_max_reg (void) { if (0 ==

Re: [OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-07 Thread Gilles Gouaillardet
Hi Piotr, this is quite an old thread now, but i did not see any support for XRC with ofed 3.12 yet (nor in trunk nor in v1.8) my understanding is that Bull already did something similar for the v1.6 series, so let me put this the other way around : does Bull have any plan to contribute this

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Gilles Gouaillardet
if nothing else), and that >> involves a lot of change. >> >> I'll instead try to provide a faster error response so it is clearer what >> is happening, hopefully letting the user fix the problem by turning on the >> loopback interface. >> >> >>

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
gt; >> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> I d like to make a step back ... >> >> i previously tested with slurm 2.6.0, and it complained about the >> slurm_verbose symbol that is defined

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
place that requires it, and it won't hurt > anything to do so. > > >> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Jeff, >> >> FWIW, you can read my analysis of what is going wrong at >> http

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-01 Thread Gilles Gouaillardet
Jeff, FWIW, you can read my analysis of what is going wrong at http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php bottom line, i agree this is a slurm issue (slurm plugin should depend on libslurm, but they do not, yet) a possible workaround would be to make the pmi component a

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-12-01 Thread Gilles Gouaillardet
> identical to LSF 4-6, which should be on the web). >> >> The output of env can be found here: >> https://dl.dropboxusercontent.com/u/1918141/env.txt [6] >> >> /M >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
MPI_PARAM_CHECK"]=" 1" i will try on a centos7 box from now. in the mean time, can you check you config.status and try again with mpirun --mca mpi_param_check true Cheers, Gilles On 2014/11/27 10:06, Gilles Gouaillardet wrote: > I will double check this(afk right now) > A

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
the same result. > > >This was indeed with a debug build. I wouldn’t expect a segfault even with an >optimized build, though - I would expect an MPI error, yes? > > > > >On Nov 26, 2014, at 4:26 PM, Gilles Gouaillardet ><gilles.gouaillar...@gmail.com> wrote:

Re: [OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
>On Nov 26, 2014, at 8:46 AM, Ralph Castain <r...@open-mpi.org> wrote: > > >Hmmm….yeah, I know we saw this and resolved it in the trunk, but it looks like >the fix indeed failed to come over to 1.8. I’ll take a gander (pretty sure I >remember how I fixed it) - thanks! >

[OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
Ralph, i noted several hangs in mtt with the v1.8 branch. a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test from the intel_tests suite, invoke mpirun on one node and run the taks on an other node : node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Gilles Gouaillardet
Ralph and Paul, On 2014/11/26 10:37, Ralph Castain wrote: > So it looks like the issue isn't so much with our code as it is with the OS > stack, yes? We aren't requiring that the loopback be "up", but the stack is > in order to establish the connection, even when we are trying a non-lo >

Re: [OMPI devel] [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-21 Thread Gilles Gouaillardet
Hi Ghislain, that sound like a but in MPI_Dist_graph_create :-( you can use MPI_Dist_graph_create_adjacent instead : MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD, degrees, [0], [0], degrees, [0], [0], info, rankReordering, ); it does not crash and as far as i

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
y > Uppsala University, Sweden > marc.hoepp...@bils.se > >> On 18 Nov 2014, at 08:40, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Hi Marc, >> >> OpenLava is based on a pretty old version of LSF (4.x if i remember >> co

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
Hi Marc, OpenLava is based on a pretty old version of LSF (4.x if i remember correctly) and i do not think LSF had support for parallel jobs tight integration at that time. my understanding is that basically, there is two kind of direct integration : - mpirun launch: mpirun spawns orted via the

Re: [OMPI devel] Error in version 1.8.3?!

2014-11-13 Thread Gilles Gouaillardet
Harmut, this is a known bug. in the mean time, can you give a try to 1.8.4rc1 ? http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4rc1.tar.gz /* if i remember correctly, this is fixed already in the rc1 */ Cheers, Gilles On 2014/11/13 19:48, Hartmut Häfner (SCC) wrote: > Dear

[OMPI devel] oshmem: put does not work with btl/vader if knem is enabled

2014-11-12 Thread Gilles Gouaillardet
Folks, I found (at least) two issues with oshmem put if btl/vader is used with knem enabled : $ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction -- SHMEM_ABORT was invoked on rank 0 (pid 11936, host=soleil) with

Re: [OMPI devel] OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
n the weekly call ? > >Cheers, > >Gilles > > > >On 2014/11/11 17:38, Mike Dubman wrote: > >how about if I will disable the failing test(s) and make jenkins to pass? It >will help us to make sure we don`t break something that did work before? On >Tue, Nov 11, 2014 at

Re: [OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
t(s) and make jenkins to pass? > It will help us to make sure we don`t break something that did work before? > > On Tue, Nov 11, 2014 at 7:02 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Mike, >> >> Jenkins runs automat

[OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
Mike, Jenkins runs automated tests on each pull request, and i think this is a good thing. recently, it reported a bunch of failure but i could not find anything to blame in the PR itself. so i created a dummy PR https://github.com/open-mpi/ompi/pull/264 with git commit --allow-empty and waited

Re: [OMPI devel] OMPI devel] Pull requests on the trunk

2014-11-06 Thread Gilles Gouaillardet
My bad (mostly) I made quite a lot of PR to get some review before commiting to the master, and did not follow up in a timely manner. I closed two obsoletes PR today. #245 should be ready for prime time. #227 too unless George has an objection. I asked Jeff to review #232 and #228 because

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Gilles Gouaillardet
J_RELEASEd by the btl add_proc if it is unreachable ? */ Cheers, Gilles On 2014/11/06 12:46, Ralph Castain wrote: >> On Nov 5, 2014, at 6:11 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Elena, >> >> the first case (-mca bt

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-05 Thread Gilles Gouaillardet
Elena, the first case (-mca btl tcp,self) crashing is a bug and i will have a look at it. the second case (-mca sm,self) is a feature : the sm btl cannot be used with tasks having different jobids (this is the case after a spawn), and obviously, self cannot be used also, so the behaviour and

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Gilles Gouaillardet
Ralph, On 2014/11/04 1:54, Ralph Castain wrote: > Hi folks > > Looking at the over-the-weekend MTT reports plus at least one comment on the > list, we have the following issues to address: > > * many-to-one continues to fail. Shall I just assume this is an unfixable > problem or a bad test and

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
t; section - i.e., add the flag if we are under solaris, > regardless of someone asking for thread support. Since we require that > libevent be thread-enabled, it seemed safer to always ensure those flags are > set. > > >> On Nov 3, 2014, at 9:05 PM, Gilles Gouaillardet >

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
Ralph, FYI, here is attached the patch i am working on (still testing ...) aa207ad2f3de5b649e5439d06dca90d86f5a82c2 should be reverted then. Cheers, Gilles On 2014/11/04 13:56, Paul Hargrove wrote: > Ralph, > > You will see from the message I sent a moment ago that -D_REENTRANT on > Solaris

Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-198-g68bec0a

2014-11-01 Thread Gilles Gouaillardet
------- >> https://github.com/open-mpi/ompi/commit/68bec0ae1f022e095c132b3f8c7317238b318416 >> >> commit 68bec0ae1f022e095c132b3f8c7317238b318416 >> Merge: 76ee98c 672d967 >> Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.

[OMPI devel] btl/openib and MPI_Intercomm_create on the same host

2014-10-31 Thread Gilles Gouaillardet
Folks, currently, the dynamic/intercomm_create fails if ran on one host with an IB port : mpirun -np 1 ./intercomm_create /* misleading error message is opal/mca/btl/openib/connect/btl_openib_connect_udcm.c:1899:udcm_process_messages] could not find associated endpoint */ this program spawns one

Re: [OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
gcc, llvm-gcc > and clang through those OS revs) > > Though I have access, I did not try compute nodes on BG/Q or Cray X{E,K,C}. > Let me know if any of those are of significant concern. > > I no longer have AIX or IRIX access. > > -Paul > > > On Mon, Oct 27, 2014

Re: [OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
Thanks Paul ! Gilles On 2014/10/27 18:47, Paul Hargrove wrote: > On Mon, Oct 27, 2014 at 2:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > [...] > >> Paul, since you have access to many platforms, could you please run this >> test

[OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
Folks, While investigating an issue started at http://www.open-mpi.org/community/lists/users/2014/10/25562.php i found that it is mandatory to compile with -D_REENTRANT on Solaris (10 and 11) (otherwise errno is not per thread specific, and the pmix thread silently misinterpret EAGAIN or

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-23 Thread Gilles Gouaillardet
on an heterogeneous cluster. could you please have a look at it when you get a chance ? Cheers, Gilles On 2014/10/16 12:26, Gilles Gouaillardet wrote: > OK, revert done : > > commit b5aea782cec116af095a7e7a7310e9e2a018 > Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> &g

Re: [OMPI devel] origin/v1.8 - compilation failed

2014-10-23 Thread Gilles Gouaillardet
Mike, the root cause is vader was not fully backported to v1.8 (two OPAL_* macros were not backported to OMPI_*) i fixed it in https://github.com/open-mpi/ompi-release/pull/49 please note a similar warning is fixed in https://github.com/open-mpi/ompi-release/pull/48 Cheers, Gilles On

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-23 Thread Gilles Gouaillardet
the program. Only some of their combinations. Also I >> am curious why basesmuma module listed twice. >> >> >> >>> Best regards, >>> Elena >>> >>> On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov <artpo...@gmail.com> >>> wrote:

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Gilles Gouaillardet
Artem, There is a known issue #235 with modex and i made PR #238 with a tentative fix. Could you please give it a try and reports if it solves your problem ? Cheers Gilles Artem Polyakov wrote: >Hello, I have troubles with latest trunk if I use PMI1. > > >For example, if

Re: [OMPI devel] Slurm direct-launch is broken on trunk

2014-10-16 Thread Gilles Gouaillardet
9b9 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 13:29:32 2014 +0900 pmi/s1: fix large keys do not overwrite the PMI key when pushing a message that does not fit within 255 bytes diff --git a/opal/mca/pmix/base/p

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-16 Thread Gilles Gouaillardet
OK, revert done : commit b5aea782cec116af095a7e7a7310e9e2a018 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 12:24:38 2014 +0900 Revert "Fix heterogeneous support" Per the discussion at http://

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-16 Thread Gilles Gouaillardet
Ralph, let me quickly reply about this one : On 2014/10/16 12:00, Ralph Castain wrote: > I also don't understand some of the changes in this commit. For example, why > did you replace the OPAL_MODEX_SEND_STRING macro with essentially a > hard-coded replica of that macro? OPAL_MODEX_SEND_STRING

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-15 Thread Gilles Gouaillardet
ose revisions listed above that are new to this repository have >> not appeared on any other notification email; so we list those >> revisions in full, below. >> >> - Log ----- >> https://github.com/open-

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
wrote: > On Oct 7, 2014, at 6:57 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > so far, using webhooks looks really simple :-) > > Good! > > > a public web server (apache+php) that can > > a) process json requests > > b)

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
sandbox from now. > > Cheers, > > Gilles > > On 2014/10/03 22:20, Gilles Gouaillardet wrote: >> will do ! >> >> Gilles >> >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >>> That's a possibility. IU could

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-05 Thread Gilles Gouaillardet
ant option, there are some more political implications (who manage/update/monitor/secure this). the second option (cron script) could be accepted more easily by IU. i will experiment on my sandbox from now. Cheers, Gilles On 2014/10/03 22:20, Gilles Gouaillardet wrote: > will do ! > > Gille

[OMPI devel] ompi github repository is NOT up to date

2014-10-05 Thread Gilles Gouaillardet
Folks, currently, https://github.com/open-mpi/ompi last commit was 13 days ago (see attached snapshot) this is not the most up to date state ! for example, the last commit of my clone is commit 54c839a970fc3025a08fe1c04b7d4b9767078264 Merge: dee6b63 5c5453b Author: Gilles Gouaillardet

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
will do ! Gilles "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >That's a possibility. IU could probably host this for us. > >Would you mind looking into how hard this would be? > > >On Oct 3, 2014, at 8:41 AM, Gilles Gouaillardet ><gilles.

Re: [OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
ignees. > >All the OMPI devs have *read* access to ompi-release, meaning you can >create/comment on issues, but not set labels/milestones/assignees. > >I did not expect this behavior. Grumble. Will have to think about it a bit... > > > > >On Oct 3, 2014, at 7:07 AM, Gi

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
On Fri, Oct 3, 2014 at 7:29 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Oct 2, 2014, at 11:33 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > the most painful part is probably to manually retrieve the git commit id > > of

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
lestone, and assign the PR to someone, > *after* you create the PR (same with creating issues). > > See https://github.com/open-mpi/ompi/wiki/SubmittingBugs for details: > > > > > > > On Oct 2, 2014, at 11:37 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.or

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
Jeff, i could not find how to apply a label to a PR via the web interface (and i am not sure i can even do that since authority might be required) any idea (maybe a special keyword in the comment ...) ? Cheers, Gilles On 2014/10/03 1:53, Jeff Squyres (jsquyres) wrote: > On Oct 2, 2014, at

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
e those CMRs as pull requests; probably in some >> cases it's the reporter, probably in some cases it's the owner. :-) >> >> >> On Oct 2, 2014, at 6:33 AM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >>> Hi Jeff, >&g

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Gilles Gouaillardet
Hi Jeff, thumbs up for the migration ! the names here are the CMR owners ('Owned by' field in TRAC) should it be the duty of the creators ('Reported by' field in TRAC) to re-create the CMR instead? /* if not, and from a git log point of view, that means the commiter will be the reviewer and not

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-10-01 Thread Gilles Gouaillardet
tive >> selection time for either graph or dist graph. >> >> -Nathan >> >> On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote: >>> Nathan, >>> >>> here is a revision of the previously attached patch, and that supports &

Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-10-01 Thread Gilles Gouaillardet
Thanks Ralph ! it did fix the problem Cheers, Gilles On 2014/10/01 3:04, Ralph Castain wrote: > I fixed this in r32818 - the components shouldn't be passing back success if > the requested info isn't found. Hope that fixes the problem. > > > On Sep 30, 2014, at 1:54 AM, Gill

[OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Gilles Gouaillardet
Folks, the dynamic/spawn test from the ibm test suite crashes if the openib btl is detected (the test can be ran on one node with an IB port) here is what happens : in mca_btl_openib_proc_create, the macro OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Gilles Gouaillardet
oiding changing > anything in topo for 1.8. > > -Nathan > > On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: >>Nathan, >> >>why not just make the topology information available at that point as you >>described it ? >> &

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Gilles Gouaillardet
Nathan, why not just make the topology information available at that point as you described it ? the attached patch does this, could you please review it ? Cheers, Gilles On 2014/09/26 2:50, Nathan Hjelm wrote: > On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >> I finally

<    1   2   3   4   5   6   7   8   >