Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san, This is interesting :-) proc is in the stack and has type orte_process_name_t with typedef uint32_t orte_jobid_t; typedef uint32_t orte_vpid_t; struct orte_process_name_t { orte_jobid_t jobid; /**< Job number */ orte_vpid_t vpid; /**< Process id - equivalent to

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san and all, Here is attached a one off patch for v1.8. /* it does not use the __attribute__ modifier that might not be supported by all compilers */ as far as i am concerned, the same issue is also in the trunk, and if you do not hit it, it just means you are lucky :-) the same issue

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
s an > OPAL_ID_T, we save it as an ORTE_NAME the issue will go away. > > George. > > > > On Fri, Aug 8, 2014 at 1:04 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Kawashima-san and all, >> >> Here is attached a one off patch

[OMPI devel] ibm abort test hangs on one node

2014-08-08 Thread Gilles Gouaillardet
Folks, here is the description of a hang i briefly mentionned a few days ago. with the trunk (i did not check 1.8 ...) simply run on one node : mpirun -np 2 --mca btl sm,self ./abort (the abort test is taken from the ibm test suite : process 0 call MPI_Abort while process 1 enters an infinite lo

Re: [OMPI devel] ibm abort test hangs on one node

2014-08-11 Thread Gilles Gouaillardet
ize in the first place (which is sufficient but might not be necessary ...) Cheers, Gilles On 2014/08/09 1:27, Ralph Castain wrote: > Committed a fix for this in r32460 - see if I got it! > > On Aug 8, 2014, at 4:02 AM, Gilles Gouaillardet > wrote: > >> Folks, >> &

Re: [OMPI devel] errors and warnings with show_help() usage

2014-08-11 Thread Gilles Gouaillardet
Jeff and all, i fixed the trivial errors in the trunk, there are now 11 non trivial errors. (commits r32490 to r32497) i ran the script vs the v1.8 branch and found 54 errors (first, you need to touch Makefile.ompi-rules in the top-level Open MPI directory in order to make the script happy) Chee

[OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Gilles Gouaillardet
Folks, i noticed mpirun (trunk) hangs when running any mpi program on two nodes *and* each node has a private network with the same ip (in my case, each node has a private network to a MIC) in order to reproduce the problem, you can simply run (as root) on the two compute nodes brctl addbr br0 if

Re: [OMPI devel] Grammar error in git master: 'You job will now abort'

2014-08-13 Thread Gilles Gouaillardet
Thanks Christopher, this has been fixed in the trunk with r32520 Cheers, Gilles On 2014/08/13 14:49, Christopher Samuel wrote: > Hi all, > > We spotted this in 1.6.5 and git grep shows it's fixed in the > v1.8 branch but in master it's still there: > > samuel@haswell:~/Code/OMPI/ompi-svn-mirror

Re: [OMPI devel] [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Gilles Gouaillardet
Lenny, that looks related to #4857 which has been fixed in trunk since r32517 could you please update your openmpi library and try again ? Gilles On 2014/08/13 17:00, Lenny Verkhovsky wrote: > Following Jeff's suggestion adding devel mailing list. > > Hi All, > I am currently facing strange sit

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
Josh, Paul, the problem with old PGI compilers comes from the preprocessor (!) with pgi 12.10 : oshmem/shmem/fortran/start_pes_f.c SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes) gets expanded as #pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak start_pes_ = pstart_pes_ )

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
In the case of PGI compilers prior to 13, a workaround is to configure with --disable-oshmem-profile On 2014/08/18 16:21, Gilles Gouaillardet wrote: > Josh, Paul, > > the problem with old PGI compilers comes from the preprocessor (!) > > with pgi 12.10 : > oshmem/shmem/for

Re: [OMPI devel] OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-19 Thread Gilles Gouaillardet
r32551 now detects this limitation and automatically disable oshmem profile. I am now revamping the patch for v1.8 Gilles Gilles Gouaillardet wrote: >In the case of PGI compilers prior to 13, a workaround is to configure with >--disable-oshmem-profile > >On 2014/08/18 1

[OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-20 Thread Gilles Gouaillardet
Folks, let's look at the following trivial test program : #include #include int main (int argc, char * argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf ("I am %d/%d and i abort\n", rank, siz

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-20 Thread Gilles Gouaillardet
> int main (void) > { > struct S y = { .i = x.i }; > return y.i; > } > > > -Paul > > > On Wed, Aug 20, 2014 at 7:20 AM, Nathan Hjelm wrote: > >> Really? That means PGI 2013 is NOT C99 compliant! Figures. >> >> -Nathan >> >>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-21 Thread Gilles Gouaillardet
struct in order to preserve the old behaviour. > > Ashley. > > On 21 Aug 2014, at 04:31, Gilles Gouaillardet > wrote: > >> Paul, >> >> the piece of code that causes an issue with PGI 2013 and older is just a bit >> more complex. >> >&g

Re: [OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-22 Thread Gilles Gouaillardet
Cheers, Gilles On 2014/08/21 6:21, Ralph Castain wrote: > I'm aware of the problem, but it will be fixed when the PMIx branch is merged > later this week. > > On Aug 19, 2014, at 10:00 PM, Gilles Gouaillardet > wrote: > >> Folks, >> >> let's look

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-22 Thread Gilles Gouaillardet
; whereas your mpi_no_op.c return 0; Cheers, Gilles Ralph Castain wrote: >You might want to try again with current head of trunk as something seems off >in what you are seeing - more below > > > >On Aug 22, 2014, at 3:12 AM, Gilles Gouaillardet > wrote: > > >Ralph, &g

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-25 Thread Gilles Gouaillardet
ass for me > > > On Aug 22, 2014, at 9:12 AM, Ralph Castain wrote: > >> On Aug 22, 2014, at 9:06 AM, Gilles Gouaillardet >> wrote: >> >>> Ralph, >>> >>> Will do on Monday >>> >>> About the first test, in my case echo $?

[OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
Folks, when i run mpirun -np 1 ./intercomm_create from the ibm test suite, it either : - success - hangs - mpirun crashes (SIGSEGV) soon after writing the following message ORTE_ERROR_LOG: Not found in file ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566 here is what happens :

Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
ture to ensure we aren't getting it confused. > >On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> when i run >> mpirun -np 1 ./intercomm_create >> from the ibm test suite, it either : >> - success >> - hangs >&

[OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-08-26 Thread Gilles Gouaillardet
Folks, the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is currently failing. i looked at the test itself, and compared it to test_shmem_zero_put.x (that is a success) and i am very puzzled ... the test calls several flavors of shmem_*_get where : - the destination is in the

[OMPI devel] coll/ml without hwloc (?)

2014-08-26 Thread Gilles Gouaillardet
Folks, i just commited r32604 in order to fix compilation (pmix) when ompi is configured with --without-hwloc now, even a trivial hello world program issues the following output (which is a non fatal, and could even be reported as a warning) : [soleil][[32389,1],0][../../../../../../src/ompi-tru

[OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Gilles Gouaillardet
Folks, the intercomm_create test case from the ibm test suite can hang under some configuration. basically, it will spawn n tasks in a first communicator, and then n tasks in a second communicator. when i run from node0 : mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2 ./interco

Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-28 Thread Gilles Gouaillardet
Thanks Ralph ! Cheers, Gilles On 2014/08/28 4:52, Ralph Castain wrote: > Took me awhile to track this down, but it is now fixed - combination of > several minor errors > > Thanks > Ralph > > On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet > wrote: > >> Folks

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
Howard and Edgar, i fixed a few bugs (r32639 and r32642) the bug is trivial to reproduce with any mpi hello world program mpirun -np 2 --mca btl openib,self hello_world after setting the mca param in the $HOME/.openmpi/mca-params.conf $ cat ~/.openmpi/mca-params.conf btl_openib_receive_queues

[OMPI devel] mpirun hangs when a task exits with a non zero code

2014-08-29 Thread Gilles Gouaillardet
Ralph and all, The following trivial test hangs /* it hangs at least 99% of the time in my environment, 1% is a race condition and the program behaves as expected */ mpirun -np 1 --mca btl self /bin/false same behaviour happen with the following trivial but MPI program : #include int main (in

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
point to the original problem that was trying to be > addressed. > > > On Aug 28, 2014, at 10:01 PM, Gilles Gouaillardet > wrote: > >> Howard and Edgar, >> >> i fixed a few bugs (r32639 and r32642) >> >> the bug is trivial to reproduce with any mpi

Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread Gilles Gouaillardet
Mishima-san, the root cause is macro expansion does not always occur as one would have expected ... could you please give a try to the attached patch ? it compiles (at least with gcc) and i made zero tests so far Cheers, Gilles On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote: > Hi

Re: [OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-09-01 Thread Gilles Gouaillardet
, Jeff Squyres (jsquyres) wrote: > Gilles -- > > Did you get a reply about this? > > > On Aug 26, 2014, at 3:17 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is >> currently

[OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Folks, mtt recently failed a bunch of times with the trunk. a good suspect is the collective/ibarrier test from the ibm test suite. most of the time, CHECK_AND_RECYCLE will fail /* IS_COLL_SYNCMEM(coll_op) is true */ with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is called on

Re: [OMPI devel] OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
ality requirement. > >Did this patch "fix" the problem by avoiding the segfault due to coll/ml >disqualifying itself? Or did it make everything work okay again? > > >On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> mtt rece

[OMPI devel] f08 bindings and weak symbols

2014-09-05 Thread Gilles Gouaillardet
Folks, when OpenMPI is configured with --disable-weak-symbols and a fortran 2008 capable compiler (e.g. gcc 4.9), MPI_STATUSES_IGNORE invoked from Fortran is not correctly interpreted as it should. /* instead of being a special array of statuses, it is an array of one status, which can lead to buf

[OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
Ralph and Brice, i noted Ralph commited r32685 in order to fix a problem with Intel compilers. The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok for me) imho, the root cause is in the hwloc configure. in this case, configure fails to detect strncasecmp is part of the C includ

Re: [OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
he config change. All I can say is that > "tolower" on my CentOS box is defined in , and that has to be > included in the misc.h header. > > > On Sep 8, 2014, at 5:49 PM, Gilles Gouaillardet > wrote: > >> Ralph and Brice, >> >> i noted Ralph c

Re: [OMPI devel] about r32685

2014-09-09 Thread Gilles Gouaillardet
ut this detection code is already a mess so I'd rather no change it again. > > Brice > > > > Le 09/09/2014 04:56, Gilles Gouaillardet a écrit : >> Ralph, >> >> ok, let me clarify my point : >> >> tolower() is invoked in : >> opal/mca/hwloc/hwlo

[OMPI devel] race condition in grpcomm/rcd

2014-09-09 Thread Gilles Gouaillardet
Folks, Since r32672 (trunk), grpcomm/rcd is the default module. the attached spawn.c test program is a trimmed version of the spawn_with_env_vars.c test case from the ibm test suite. when invoked on two nodes : - the program hangs with -np 2 - the program can crash with np > 2 error message is [n

Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Gilles Gouaillardet
ggouaillardet -> ggouaillardet On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote: > As the next step of the planned migration to Github, I need to know: > > - Your Github ID (so that you can be added to the new OMPI git repo) > - Your SVN ID (so that I can map SVN->Github IDs, and therefore map T

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
em to establish a >> persistent receive. They then can use the signature to tell which collective >> the incoming message belongs to. >> >> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot. >> >> >> On Sep 9, 2014, at 3:10 AM

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
the right fix, it was very lightly tested, but so far, it works for me ... Cheers, Gilles On 2014/09/11 16:11, Gilles Gouaillardet wrote: > Ralph, > > things got worst indeed :-( > > now a simple hello world involving two hosts hang in mpi_init. > there is still a race conditio

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet > wrote: > >> Ralph, >> >> the root cause is when the second orted/mpirun runs rcd_finalize_coll, >> it does not invoke pmix_server_release >> because allgather_stub was not previously invoked since the the

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-12 Thread Gilles Gouaillardet
2 and 3 enter the allgather at the send time, they will sent a message to each other at the same time and rml fails establishing the connection. i could not find whether this is linked to my changes... Cheers, Gilles > > On Sep 11, 2014, at 5:23 PM, Gilles Gouaillardet < > gilles.gouai

Re: [OMPI devel] coll ml error with some nonblocking collectives

2014-09-15 Thread Gilles Gouaillardet
Howard, and Rolf, i initially reported the issue at http://www.open-mpi.org/community/lists/devel/2014/09/15767.php r32659 is not a fix nor a regression, it simply aborts instead of OBJ_RELEASE(mpi_comm_world). /* my point here is we should focus on the root cause and not the consequence */ firs

[OMPI devel] race condition in oob/tcp

2014-09-16 Thread Gilles Gouaillardet
Ralph, here is the full description of a race condition in oob/tcp i very briefly mentionned in a previous post : the race condition can occur when two not connected orted try to send a message to each other for the first time and at the same time. that can occur when running mpi helloworld on 4

Re: [OMPI devel] race condition in oob/tcp

2014-09-17 Thread Gilles Gouaillardet
nnections, and then have the higher vpid retry while the lower one waits. > The logic for that was still in place, but it looks like you are hitting a > different code path, and I found another potential one as well. So I think I > plugged the holes, but will wait to hear if you confi

Re: [OMPI devel] race condition in oob/tcp

2014-09-18 Thread Gilles Gouaillardet
e that triggers it so I > can continue debugging > > Ralph > > On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet > wrote: > >> Thanks Ralph, >> >> this is much better but there is still a bug : >> with the very same scenario i described earlier, vpid 2

[OMPI devel] RFC: remove the --with-threads configure option

2014-09-18 Thread Gilles Gouaillardet
Folks, for both trunk and v1.8 branch, configure takes the --with-threads option. valid usages are --with-threads, --with-threads=yes, --with-threads=posix and --with-threads=no /* v1.6 used to support the --with-threads=solaris */ if we try to configure with --with-threads=no, this will result

[OMPI devel] v1.8 does not compile any more

2014-09-19 Thread Gilles Gouaillardet
Folks, r32716 broke v1.8 :-( the root cause it it included MCA_BASE_VAR_TYPE_VERSION_STRING which has not yet landed into v1.8 the attached trivial patch fixes this issue Can the RM/GK please review it and apply it ? Cheers, Gilles Index: opal/mca/base/mca_base_var.c =

Re: [OMPI devel] race condition in oob/tcp

2014-09-19 Thread Gilles Gouaillardet
moved from MCA_OOB_TCP_CONNECT_ACK to MCA_OOB_TCP_CLOSED, retry() should have been invoked ? Cheers, Gilles On 2014/09/18 17:02, Ralph Castain wrote: > The patch looks fine to me - please go ahead and apply it. Thanks! > > On Sep 17, 2014, at 11:35 PM, Gilles Gouaillardet > wrote: > &

Re: [OMPI devel] race condition in oob/tcp

2014-09-19 Thread Gilles Gouaillardet
es On Fri, Sep 19, 2014 at 8:06 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Ralph, > > i found an other race condition. > in a very specific scenario, vpid3 is in the MCA_OOB_TCP_CLOSED state, > and processes data from the socket received from vpid 2 >

Re: [OMPI devel] race condition in oob/tcp

2014-09-21 Thread Gilles Gouaillardet
Thanks for the pointer George ! On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca wrote: > Or copy the handshake protocol design of the TCP BTL... > > the main difference between oob/tcp and btl/tcp is the way we resolve the situation in which two processes send their first message to each other a

Re: [OMPI devel] race condition in oob/tcp

2014-09-22 Thread Gilles Gouaillardet
that release too long. > Alternatively, I can take care of it if you don't have time (I'm asking if > you can do it solely because you have the reproducer). > > > On Sep 21, 2014, at 6:54 AM, Ralph Castain wrote: > > Sounds fine with me - please go ahead, and thanks &

Re: [OMPI devel] RFC: "v1.9.0" (vs. "v1.9")

2014-09-22 Thread Gilles Gouaillardet
Folks, if i read between the lines, it looks like the next stable branch will be v2.0 and not v1.10 is there a strong reason for that (such as ABI compatibility will break, or a major but internal refactoring) ? /* other than v1.10 is less than v1.8 when comparing strings :-) */ Cheers, Gilles

Re: [OMPI devel] Conversion to GitHub: POSTPONED

2014-09-23 Thread Gilles Gouaillardet
my 0.02 US$ ... Bitbucket pricing model is per user (but with free public/private repository up to 5 users) whereas github pricing is per *private* repository (and free public repository and with unlimited users) from an OpenMPI point of view, this means : - with github, only the private ompi-tes

Re: [OMPI devel] race condition in oob/tcp

2014-09-26 Thread Gilles Gouaillardet
e race condition vis 1.8 - I agree it > is not a blocker for that release. > > Ralph > > On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet > wrote: > >> Ralph, >> >> here is the patch i am using so far. >> i will resume working on this from Wednesday (t

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Gilles Gouaillardet
Nathan, why not just make the topology information available at that point as you described it ? the attached patch does this, could you please review it ? Cheers, Gilles On 2014/09/26 2:50, Nathan Hjelm wrote: > On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >> I finally man

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Gilles Gouaillardet
oiding changing > anything in topo for 1.8. > > -Nathan > > On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: >>Nathan, >> >>why not just make the topology information available at that point as you >>described it ? >> &

[OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Gilles Gouaillardet
Folks, the dynamic/spawn test from the ibm test suite crashes if the openib btl is detected (the test can be ran on one node with an IB port) here is what happens : in mca_btl_openib_proc_create, the macro OPAL_MODEX_RECV(rc, &mca_btl_openib_component.super.btl_version, p

Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-10-01 Thread Gilles Gouaillardet
Thanks Ralph ! it did fix the problem Cheers, Gilles On 2014/10/01 3:04, Ralph Castain wrote: > I fixed this in r32818 - the components shouldn't be passing back success if > the requested info isn't found. Hope that fixes the problem. > > > On Sep 30, 2014, at 1:5

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-10-01 Thread Gilles Gouaillardet
time for either graph or dist graph. >> >> -Nathan >> >> On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote: >>> Nathan, >>> >>> here is a revision of the previously attached patch, and that supports >>> graph and di

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Gilles Gouaillardet
Hi Jeff, thumbs up for the migration ! the names here are the CMR owners ('Owned by' field in TRAC) should it be the duty of the creators ('Reported by' field in TRAC) to re-create the CMR instead? /* if not, and from a git log point of view, that means the commiter will be the reviewer and not

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Gilles Gouaillardet
ose CMRs as pull requests; probably in some >> cases it's the reporter, probably in some cases it's the owner. :-) >> >> >> On Oct 2, 2014, at 6:33 AM, Gilles Gouaillardet >> wrote: >> >>> Hi Jeff, >>> >>> thumbs up for the

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Gilles Gouaillardet
Jeff, i could not find how to apply a label to a PR via the web interface (and i am not sure i can even do that since authority might be required) any idea (maybe a special keyword in the comment ...) ? Cheers, Gilles On 2014/10/03 1:53, Jeff Squyres (jsquyres) wrote: > On Oct 2, 2014, at 12:4

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
PR to someone, > *after* you create the PR (same with creating issues). > > See https://github.com/open-mpi/ompi/wiki/SubmittingBugs for details: > > > > > > > On Oct 2, 2014, at 11:37 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > >

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
On Fri, Oct 3, 2014 at 7:29 PM, Jeff Squyres (jsquyres) wrote: > On Oct 2, 2014, at 11:33 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > the most painful part is probably to manually retrieve the git commit id > > of a given svn commit id >

Re: [OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
the OMPI devs have *read* access to ompi-release, meaning you can >create/comment on issues, but not set labels/milestones/assignees. > >I did not expect this behavior. Grumble. Will have to think about it a bit... > > > > >On Oct 3, 2014, at 7:07 AM, Gilles Gouaill

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
will do ! Gilles "Jeff Squyres (jsquyres)" wrote: >That's a possibility. IU could probably host this for us. > >Would you mind looking into how hard this would be? > > >On Oct 3, 2014, at 8:41 AM, Gilles Gouaillardet > wrote: > >> Jeff, >>

[OMPI devel] ompi github repository is NOT up to date

2014-10-05 Thread Gilles Gouaillardet
Folks, currently, https://github.com/open-mpi/ompi last commit was 13 days ago (see attached snapshot) this is not the most up to date state ! for example, the last commit of my clone is commit 54c839a970fc3025a08fe1c04b7d4b9767078264 Merge: dee6b63 5c5453b Author: Gilles Gouaillardet List

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-05 Thread Gilles Gouaillardet
ion, there are some more political implications (who manage/update/monitor/secure this). the second option (cron script) could be accepted more easily by IU. i will experiment on my sandbox from now. Cheers, Gilles On 2014/10/03 22:20, Gilles Gouaillardet wrote: > will do ! > > Gille

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
sandbox from now. > > Cheers, > > Gilles > > On 2014/10/03 22:20, Gilles Gouaillardet wrote: >> will do ! >> >> Gilles >> >> "Jeff Squyres (jsquyres)" wrote: >>> That's a possibility. IU could probably host this for us. &

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
014, at 6:57 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > so far, using webhooks looks really simple :-) > > Good! > > > a public web server (apache+php) that can > > a) process json requests > > b) issue curl requests > > is

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-08 Thread Gilles Gouaillardet
be 100% sure ... */ Cheers, Gilles On 2014/10/07 22:55, Jeff Squyres (jsquyres) wrote: > Sounds perfect. > > On Oct 7, 2014, at 9:49 AM, Gilles Gouaillardet > wrote: > >> Jeff, >> >> that should not be an issue since github provides the infrastructure to >

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-15 Thread Gilles Gouaillardet
gt;> Those revisions listed above that are new to this repository have >> not appeared on any other notification email; so we list those >> revisions in full, below. >> >> - Log --------- >> https://github.com/o

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-15 Thread Gilles Gouaillardet
Ralph, let me quickly reply about this one : On 2014/10/16 12:00, Ralph Castain wrote: > I also don't understand some of the changes in this commit. For example, why > did you replace the OPAL_MODEX_SEND_STRING macro with essentially a > hard-coded replica of that macro? OPAL_MODEX_SEND_STRING

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-15 Thread Gilles Gouaillardet
OK, revert done : commit b5aea782cec116af095a7e7a7310e9e2a018 Author: Gilles Gouaillardet List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 12:24:38 2014 +0900 Revert "Fix heterogeneous support" Per the discussion at http://www.open-mpi.org/community/lists/devel/201

Re: [OMPI devel] Slurm direct-launch is broken on trunk

2014-10-16 Thread Gilles Gouaillardet
9b9 Author: Gilles Gouaillardet List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 13:29:32 2014 +0900 pmi/s1: fix large keys do not overwrite the PMI key when pushing a message that does not fit within 255 bytes diff --git a/opal/mca/pmix/base/pmix_base_fns.c b/opal/mca

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Gilles Gouaillardet
Artem, There is a known issue #235 with modex and i made PR #238 with a tentative fix. Could you please give it a try and reports if it solves your problem ? Cheers Gilles Artem Polyakov wrote: >Hello, I have troubles with latest trunk if I use PMI1. > > >For example, if I use 2 nodes the app

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-23 Thread Gilles Gouaillardet
ombinations. Also I >> am curious why basesmuma module listed twice. >> >> >> >>> Best regards, >>> Elena >>> >>> On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov >>> wrote: >>> >>>> Gilles, >>>> >&

Re: [OMPI devel] origin/v1.8 - compilation failed

2014-10-23 Thread Gilles Gouaillardet
Mike, the root cause is vader was not fully backported to v1.8 (two OPAL_* macros were not backported to OMPI_*) i fixed it in https://github.com/open-mpi/ompi-release/pull/49 please note a similar warning is fixed in https://github.com/open-mpi/ompi-release/pull/48 Cheers, Gilles On 2014/10/

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-23 Thread Gilles Gouaillardet
heterogeneous cluster. could you please have a look at it when you get a chance ? Cheers, Gilles On 2014/10/16 12:26, Gilles Gouaillardet wrote: > OK, revert done : > > commit b5aea782cec116af095a7e7a7310e9e2a018 > Author: Gilles Gouaillardet > Date: Thu Oct 16 12:2

[OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
Folks, While investigating an issue started at http://www.open-mpi.org/community/lists/users/2014/10/25562.php i found that it is mandatory to compile with -D_REENTRANT on Solaris (10 and 11) (otherwise errno is not per thread specific, and the pmix thread silently misinterpret EAGAIN or EWOULDBLO

Re: [OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
Thanks Paul ! Gilles On 2014/10/27 18:47, Paul Hargrove wrote: > On Mon, Oct 27, 2014 at 2:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > [...] > >> Paul, since you have access to many platforms, could you please run this >> test

Re: [OMPI devel] errno and reentrance

2014-10-27 Thread Gilles Gouaillardet
which has been gcc, llvm-gcc > and clang through those OS revs) > > Though I have access, I did not try compute nodes on BG/Q or Cray X{E,K,C}. > Let me know if any of those are of significant concern. > > I no longer have AIX or IRIX access. > > -Paul > > > On Mon

[OMPI devel] btl/openib and MPI_Intercomm_create on the same host

2014-10-31 Thread Gilles Gouaillardet
Folks, currently, the dynamic/intercomm_create fails if ran on one host with an IB port : mpirun -np 1 ./intercomm_create /* misleading error message is opal/mca/btl/openib/connect/btl_openib_connect_udcm.c:1899:udcm_process_messages] could not find associated endpoint */ this program spawns one

Re: [OMPI devel] OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-198-g68bec0a

2014-11-01 Thread Gilles Gouaillardet
----- >> https://github.com/open-mpi/ompi/commit/68bec0ae1f022e095c132b3f8c7317238b318416 >> >> commit 68bec0ae1f022e095c132b3f8c7317238b318416 >> Merge: 76ee98c 672d967 >> Author: Gilles Gouaillardet >> Date: Fri Oct 31 16:34:43 201

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
Ralph, FYI, here is attached the patch i am working on (still testing ...) aa207ad2f3de5b649e5439d06dca90d86f5a82c2 should be reverted then. Cheers, Gilles On 2014/11/04 13:56, Paul Hargrove wrote: > Ralph, > > You will see from the message I sent a moment ago that -D_REENTRANT on > Solaris a

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
t; section - i.e., add the flag if we are under solaris, > regardless of someone asking for thread support. Since we require that > libevent be thread-enabled, it seemed safer to always ensure those flags are > set. > > >> On Nov 3, 2014, at 9:05 PM, Gilles Gouaillardet >&

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Gilles Gouaillardet
Ralph, On 2014/11/04 1:54, Ralph Castain wrote: > Hi folks > > Looking at the over-the-weekend MTT reports plus at least one comment on the > list, we have the following issues to address: > > * many-to-one continues to fail. Shall I just assume this is an unfixable > problem or a bad test and i

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-05 Thread Gilles Gouaillardet
Elena, the first case (-mca btl tcp,self) crashing is a bug and i will have a look at it. the second case (-mca sm,self) is a feature : the sm btl cannot be used with tasks having different jobids (this is the case after a spawn), and obviously, self cannot be used also, so the behaviour and erro

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Gilles Gouaillardet
SEd by the btl add_proc if it is unreachable ? */ Cheers, Gilles On 2014/11/06 12:46, Ralph Castain wrote: >> On Nov 5, 2014, at 6:11 PM, Gilles Gouaillardet >> wrote: >> >> Elena, >> >> the first case (-mca btl tcp,self) crashing is a bug and i will have a

Re: [OMPI devel] OMPI devel] Pull requests on the trunk

2014-11-06 Thread Gilles Gouaillardet
My bad (mostly) I made quite a lot of PR to get some review before commiting to the master, and did not follow up in a timely manner. I closed two obsoletes PR today. #245 should be ready for prime time. #227 too unless George has an objection. I asked Jeff to review #232 and #228 because they

[OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
Mike, Jenkins runs automated tests on each pull request, and i think this is a good thing. recently, it reported a bunch of failure but i could not find anything to blame in the PR itself. so i created a dummy PR https://github.com/open-mpi/ompi/pull/264 with git commit --allow-empty and waited

Re: [OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
t(s) and make jenkins to pass? > It will help us to make sure we don`t break something that did work before? > > On Tue, Nov 11, 2014 at 7:02 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Mike, >> >> Jenkins runs automated tests o

Re: [OMPI devel] OMPI devel] Jenkins vs master (and v1.8)

2014-11-11 Thread Gilles Gouaillardet
Thanks Mike, BTW what is the distro running on your test cluster ? Mike Dubman wrote: >ok, I disabled vader tests in SHMEM and it passes. > >it can be requested from jenkins by specifying "vader" in PR comment line. > > >On Tue, Nov 11, 2014 at 11:04 AM, Gilles Go

[OMPI devel] oshmem: put does not work with btl/vader if knem is enabled

2014-11-12 Thread Gilles Gouaillardet
Folks, I found (at least) two issues with oshmem put if btl/vader is used with knem enabled : $ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction -- SHMEM_ABORT was invoked on rank 0 (pid 11936, host=soleil) with error

Re: [OMPI devel] Error in version 1.8.3?!

2014-11-13 Thread Gilles Gouaillardet
Harmut, this is a known bug. in the mean time, can you give a try to 1.8.4rc1 ? http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4rc1.tar.gz /* if i remember correctly, this is fixed already in the rc1 */ Cheers, Gilles On 2014/11/13 19:48, Hartmut Häfner (SCC) wrote: > Dear d

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
Hi Marc, OpenLava is based on a pretty old version of LSF (4.x if i remember correctly) and i do not think LSF had support for parallel jobs tight integration at that time. my understanding is that basically, there is two kind of direct integration : - mpirun launch: mpirun spawns orted via the A

Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-18 Thread Gilles Gouaillardet
y > Uppsala University, Sweden > marc.hoepp...@bils.se > >> On 18 Nov 2014, at 08:40, Gilles Gouaillardet >> wrote: >> >> Hi Marc, >> >> OpenLava is based on a pretty old version of LSF (4.x if i remember >> correctly) >> and i do not think L

Re: [OMPI devel] [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-21 Thread Gilles Gouaillardet
Hi Ghislain, that sound like a but in MPI_Dist_graph_create :-( you can use MPI_Dist_graph_create_adjacent instead : MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD, degrees, &targets[0], &weights[0], degrees, &targets[0], &weights[0], info, rankReordering, &commGraph); it

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Gilles Gouaillardet
Ralph and Paul, On 2014/11/26 10:37, Ralph Castain wrote: > So it looks like the issue isn't so much with our code as it is with the OS > stack, yes? We aren't requiring that the loopback be "up", but the stack is > in order to establish the connection, even when we are trying a non-lo > interf

[OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
Ralph, i noted several hangs in mtt with the v1.8 branch. a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test from the intel_tests suite, invoke mpirun on one node and run the taks on an other node : node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f

<    1   2   3   4   5   6   7   8   9   >