Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul, from the logs, the only difference i see is about Fortran PROCEDURE. openpmi 1.8 (svn checkout) does not build the usempif08 bindings if PROCEDURE is not supported. from the logs, openmpi 1.8.1 does not check whether PROCEDURE is supported or not here is the sample program to check

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
5-Illegal procedure interface - mpi_user_function (conftest.f90: > 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc > {hargrove@hopper04 OMPI}$ pgf90 -V pgf90 13.10-0 64-bit target on x86-64 > Linux -tp shanghai The Portland Group - PGI Compilers and Tools Copyright > (c) 2013,

Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
George, #4815 is indirectly related to the move : in bcol/basesmuma, we used to compare ompi_process_name_t, and now we (try to) compare an ompi_process_name_t and an opal_process_name_t (which causes a glory SIGSEGV) i proposed a temporary patch which is both broken and unelegant, could you

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul, this is a fair point. i commited r32354 in order to abort configure in this case Cheers, Gilles On 2014/07/30 15:11, Paul Hargrove wrote: > On a related topic: > > I configured with an explicit --enable-mpi-fortran=usempif08. > Then configure found PROCEDURE was missing/broken. > The

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles GOUAILLARDET
e/util/name_fns.c:522 > >522       if (name1->jobid < name2->jobid) { > >(gdb) print name1 > >$1 = (const orte_process_name_t *) 0x192350001 > >(gdb) print *name1 > >Cannot access memory at address 0x192350001 > >(gdb) print name2 > >$2 = (const orte_proce

Re: [OMPI devel] OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles GOUAILLARDET
't have a PGI compiler. I also didn't specify a level of Fortran >support, but just had --enable-mpi-fortran > >Maybe we need to revert this commit until we figure out a better solution? > >On Jul 30, 2014, at 12:16 AM, Gilles Gouaillardet ><gilles.gouaillar...@iferc.org>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
stead of a pointer to the name > > r32357 > > On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET < > gilles.gouaillar...@gmail.com> wrote: > > Rolf, > > r32353 can be seen as a suspect... > Even if it is correct, it might have exposed the bug discussed in #48

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul and all, For what it's worth, with openmpi 1.8.2rc2 and the intel fortran compiler version 14.0.3.174 : $ nm libmpi_usempif08.so| grep -i sizeof there is no such undefined symbol (mpi_f08_sizeof_) as a temporary workaround, did you try to force the linker use

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul, in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb test program, compile and run nm | grep f08 on the object : $ cat foo.f90 program foo use mpi_f08_sizeof implicit none real :: x integer :: size, ierror call MPI_Sizeof_real_s_4(x, size, ierror) stop end program

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul, the ibm test suite from the non public ompi-tests repository has several tests for usempif08. Cheers, Gilles On 2014/08/01 11:04, Paul Hargrove wrote: > Second related issue: > > Can/should examples/hello_usempif08.f90 be extended to use more of the > module such that it would have

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul, As Ralph pointed, this issue was reported last month on the user mailing list. #include did not help : http://www.open-mpi.org/community/lists/users/2014/07/24883.php I will try if i can reproduce and fix this issue on a solaris10 (but x86) VM BTW, are you using the GNU compiler ?

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
e Studio > compilers are preferred. > Let me know if you need me to try any of those gcc installations. > > -Paul > > > On Thu, Jul 31, 2014 at 9:12 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Paul, >> >> As Ralph pointed

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph, for what it's worth : a) i faced the very same issue on my (slw) qemu emulated ppc64 vm b) i was able to run very basic programs when passing --mca coll ^ml to mpirun Cheers, Gilles On 2014/08/01 12:30, Ralph Castain wrote: > Yes, I fear this will require some effort to

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
pagesize = 65536; /* safer to overestimate than under */ > #endif > > > opal_pagesize() anyone? > > -Paul > > On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Paul, >> >> you are absolu

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
? /* and then _process_name_jobid_for_opal, _process_name_vpid_for_opal, opal_process_name_vpid_should_never_be_called should also be updated */ Cheers, Gilles On 2014/08/01 19:52, Gilles Gouaillardet wrote: > George and Ralph, > > i am very confused whether there is an issue or not. > > > anyway, today Pa

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Gilles Gouaillardet
Paul, about the second point : mmap is called with the MAP_FIXED flag, before the fix, the required address was not aligned on a page size and hence mmap failed. the mmap failure was immediatly handled, but for some reasons i did not fully investigate yet, this failure was not correctly

Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles GOUAILLARDET
Paul, imho, the root cause is a missing ampersand. I will double check this from tomorrow only Cheers, Gilles Ralph Castain wrote: >Arg - that raises an interesting point. This is a pointer to a 64-bit number. >Will uintptr_t resolve that problem on such platforms? > >

Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles Gouaillardet
on both 32 and 64 bits arch in this case : #if OPAL_ENABLE_DEBUG static inline orte_process_name_t * OMPI_CAST_RTE_NAME(opal_process_name_t * name); #else #define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a)) #endif Cheers, Gilles On 2014/08/03 14:49, Gilles GOUAILLARDET wrote: > P

Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-04 Thread Gilles Gouaillardet
out 25% of the time --mca btl scif,self => always hang only the mpirun process remains and is hanging. i will try to debug this, and i welcome any help ! Cheers, Gilles On 2014/08/04 11:57, Gilles Gouaillardet wrote: > Paul, > > i confirm ampersand was missing and this was a bug

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-04 Thread Gilles Gouaillardet
Fixed in r32409 : %d and %s were swapped in a MLERROR (printf like) Gilles On 2014/08/02 11:07, Gilles Gouaillardet wrote: > Paul, > > about the second point : > mmap is called with the MAP_FIXED flag, before the fix, the > required address was not aligned on a page size and henc

Re: [OMPI devel] oshmem enabled by default

2014-08-04 Thread Gilles Gouaillardet
Paul, this is a bit trickier ... on a Linux platform oshmem is built by default, on a non Linux platform, oshmem is *not* built by default. so the configure message (disabled by default) is correct on non Linux platform, and incorrect on Linux platform ... i do not know what should be done,

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
eing empty (bswap_64 or something). > > George. > > On Aug 1, 2014, at 06:52 , Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> George and Ralph, >> >> i am very confused whether there is an issue or not. >> >> >> anyway, t

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-05 Thread Gilles Gouaillardet
from libopen-pal.la : dependency_libs=' -lrdmacm -libverbs -lscif -lnuma -ldl -lrt -lnsl -lutil -lm' i confirm mpicc fails linking but FWIT, using libtool does work (!) could the bug come from the mpicc (and other) wrappers ? Gilles $ gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-05 Thread Gilles Gouaillardet
Here is a patch that has been minimally tested. this is likely an overkill (at least when dynamic libraries can be used), but it does the job so far ... Cheers, Gilles On 2014/08/05 16:56, Gilles Gouaillardet wrote: > from libopen-pal.la : > dependency_libs=' -lrdmacm -libverbs -lscif

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
y speaking, converting a 64 bits to a big endian >>> representation requires the swap of the 2 32 bits parts. So the correct >>> approach would have been: >>> uint64_t htonll(uint64_t v) >>> { >>> return uint64_t)ntohl(n)) << 32 | (uint64_t)ntoh

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-06 Thread Gilles Gouaillardet
testing. > Since I've determined already that 1.6.5 did not have the problem while > 1.7.x does, the possibility exists that some smaller change might exist to > restore what ever was lost between the v1.6 and v1.7 branches. > > -Paul > > > On Tue, Aug 5, 2014 at 1:

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Gilles Gouaillardet
Ralph and George, here is attached a patch that fixes the heterogeneous support without the abstraction violation. Cheers, Gilles On 2014/08/06 9:40, Gilles Gouaillardet wrote: > hummm > > i intentionally did not swap the two 32 bits (!) > > from the top level, what we have i

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-07 Thread Gilles Gouaillardet
to the POSIX prototype (aka. returning the changes value >> instead of doing things inplace). >> >> George. >> >> >> >> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> Ralph and George,

Re: [OMPI devel] v1.8.2 still held up...

2014-08-08 Thread Gilles Gouaillardet
Ralph and all, > * static linking failure - Gilles has posted a proposed fix, but somebody > needs to approve and CMR it. Please see: > https://svn.open-mpi.org/trac/ompi/ticket/4834 Jeff made a better fix (r32447) to which i added a minor correction (r32448). as far as i am concerned,

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san, This is interesting :-) proc is in the stack and has type orte_process_name_t with typedef uint32_t orte_jobid_t; typedef uint32_t orte_vpid_t; struct orte_process_name_t { orte_jobid_t jobid; /**< Job number */ orte_vpid_t vpid; /**< Process id - equivalent to

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
as an ORTE_NAME the issue will go away. > > George. > > > > On Fri, Aug 8, 2014 at 1:04 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Kawashima-san and all, >> >> Here is attached a one off patch for v1.8. >> /* it does

[OMPI devel] ibm abort test hangs on one node

2014-08-08 Thread Gilles Gouaillardet
Folks, here is the description of a hang i briefly mentionned a few days ago. with the trunk (i did not check 1.8 ...) simply run on one node : mpirun -np 2 --mca btl sm,self ./abort (the abort test is taken from the ibm test suite : process 0 call MPI_Abort while process 1 enters an infinite

Re: [OMPI devel] ibm abort test hangs on one node

2014-08-11 Thread Gilles Gouaillardet
r_finalize in the first place (which is sufficient but might not be necessary ...) Cheers, Gilles On 2014/08/09 1:27, Ralph Castain wrote: > Committed a fix for this in r32460 - see if I got it! > > On Aug 8, 2014, at 4:02 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> w

Re: [OMPI devel] errors and warnings with show_help() usage

2014-08-11 Thread Gilles Gouaillardet
Jeff and all, i fixed the trivial errors in the trunk, there are now 11 non trivial errors. (commits r32490 to r32497) i ran the script vs the v1.8 branch and found 54 errors (first, you need to touch Makefile.ompi-rules in the top-level Open MPI directory in order to make the script happy)

[OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Gilles Gouaillardet
Folks, i noticed mpirun (trunk) hangs when running any mpi program on two nodes *and* each node has a private network with the same ip (in my case, each node has a private network to a MIC) in order to reproduce the problem, you can simply run (as root) on the two compute nodes brctl addbr br0

Re: [OMPI devel] Grammar error in git master: 'You job will now abort'

2014-08-13 Thread Gilles Gouaillardet
Thanks Christopher, this has been fixed in the trunk with r32520 Cheers, Gilles On 2014/08/13 14:49, Christopher Samuel wrote: > Hi all, > > We spotted this in 1.6.5 and git grep shows it's fixed in the > v1.8 branch but in master it's still there: > >

Re: [OMPI devel] [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Gilles Gouaillardet
Lenny, that looks related to #4857 which has been fixed in trunk since r32517 could you please update your openmpi library and try again ? Gilles On 2014/08/13 17:00, Lenny Verkhovsky wrote: > Following Jeff's suggestion adding devel mailing list. > > Hi All, > I am currently facing strange

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
Josh, Paul, the problem with old PGI compilers comes from the preprocessor (!) with pgi 12.10 : oshmem/shmem/fortran/start_pes_f.c SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes) gets expanded as #pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak start_pes_ = pstart_pes_

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
In the case of PGI compilers prior to 13, a workaround is to configure with --disable-oshmem-profile On 2014/08/18 16:21, Gilles Gouaillardet wrote: > Josh, Paul, > > the problem with old PGI compilers comes from the preprocessor (!) > > with pgi 12.10 : > oshmem/shmem/for

Re: [OMPI devel] OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-19 Thread Gilles Gouaillardet
r32551 now detects this limitation and automatically disable oshmem profile. I am now revamping the patch for v1.8 Gilles Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote: >In the case of PGI compilers prior to 13, a workaround is to configure with >--disable-oshmem-profil

[OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-20 Thread Gilles Gouaillardet
Folks, let's look at the following trivial test program : #include #include int main (int argc, char * argv[]) { int rank, size; MPI_Init(, ); MPI_Comm_size(MPI_COMM_WORLD, ); MPI_Comm_rank(MPI_COMM_WORLD, ); printf ("I am %d/%d and i abort\n", rank, size);

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-21 Thread Gilles Gouaillardet
t; -Nathan >> >> On Tue, Aug 19, 2014 at 10:48:48PM -0400, svn-commit-mai...@open-mpi.org >> wrote: >>> Author: ggouaillardet (Gilles Gouaillardet) >>> Date: 2014-08-19 22:48:47 EDT (Tue, 19 Aug 2014) >>> New Revision: 32555 >>> URL: ht

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-21 Thread Gilles Gouaillardet
he struct in order to preserve the old behaviour. > > Ashley. > > On 21 Aug 2014, at 04:31, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> > wrote: > >> Paul, >> >> the piece of code that causes an issue with PGI 2013 and older is just a bit >> mor

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-22 Thread Gilles Gouaillardet
; whereas your mpi_no_op.c return 0; Cheers, Gilles Ralph Castain <r...@open-mpi.org> wrote: >You might want to try again with current head of trunk as something seems off >in what you are seeing - more below > > > >On Aug 22, 2014, at 3:12 AM, Gilles Gouaillardet ><

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-25 Thread Gilles Gouaillardet
ass for me > > > On Aug 22, 2014, at 9:12 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> On Aug 22, 2014, at 9:06 AM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >>> Ralph, >>> >>> Will do on Monday &g

[OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
Folks, when i run mpirun -np 1 ./intercomm_create from the ibm test suite, it either : - success - hangs - mpirun crashes (SIGSEGV) soon after writing the following message ORTE_ERROR_LOG: Not found in file ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566 here is what happens :

Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
gt;look at that signature to ensure we aren't getting it confused. > >On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet ><gilles.gouaillar...@iferc.org> wrote: > >> Folks, >> >> when i run >> mpirun -np 1 ./intercomm_create >> from the ibm test

[OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-08-26 Thread Gilles Gouaillardet
Folks, the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is currently failing. i looked at the test itself, and compared it to test_shmem_zero_put.x (that is a success) and i am very puzzled ... the test calls several flavors of shmem_*_get where : - the destination is in the

[OMPI devel] coll/ml without hwloc (?)

2014-08-26 Thread Gilles Gouaillardet
Folks, i just commited r32604 in order to fix compilation (pmix) when ompi is configured with --without-hwloc now, even a trivial hello world program issues the following output (which is a non fatal, and could even be reported as a warning) :

[OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Gilles Gouaillardet
Folks, the intercomm_create test case from the ibm test suite can hang under some configuration. basically, it will spawn n tasks in a first communicator, and then n tasks in a second communicator. when i run from node0 : mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2

Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-28 Thread Gilles Gouaillardet
Thanks Ralph ! Cheers, Gilles On 2014/08/28 4:52, Ralph Castain wrote: > Took me awhile to track this down, but it is now fixed - combination of > several minor errors > > Thanks > Ralph > > On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet > <gilles.gouaillar...@i

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
Howard and Edgar, i fixed a few bugs (r32639 and r32642) the bug is trivial to reproduce with any mpi hello world program mpirun -np 2 --mca btl openib,self hello_world after setting the mca param in the $HOME/.openmpi/mca-params.conf $ cat ~/.openmpi/mca-params.conf btl_openib_receive_queues

[OMPI devel] mpirun hangs when a task exits with a non zero code

2014-08-29 Thread Gilles Gouaillardet
Ralph and all, The following trivial test hangs /* it hangs at least 99% of the time in my environment, 1% is a race condition and the program behaves as expected */ mpirun -np 1 --mca btl self /bin/false same behaviour happen with the following trivial but MPI program : #include int main

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
original problem that was trying to be > addressed. > > > On Aug 28, 2014, at 10:01 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Howard and Edgar, >> >> i fixed a few bugs (r32639 and r32642) >> >> the bug is trivial

Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread Gilles Gouaillardet
Mishima-san, the root cause is macro expansion does not always occur as one would have expected ... could you please give a try to the attached patch ? it compiles (at least with gcc) and i made zero tests so far Cheers, Gilles On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote: > Hi

Re: [OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-09-01 Thread Gilles Gouaillardet
, Jeff Squyres (jsquyres) wrote: > Gilles -- > > Did you get a reply about this? > > > On Aug 26, 2014, at 3:17 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Folks, >> >> the test_shmem_zero_get.x from the openshmem-release-

[OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Folks, mtt recently failed a bunch of times with the trunk. a good suspect is the collective/ibarrier test from the ibm test suite. most of the time, CHECK_AND_RECYCLE will fail /* IS_COLL_SYNCMEM(coll_op) is true */ with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is called

Re: [OMPI devel] OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
ion of the coll/ml locality requirement. > >Did this patch "fix" the problem by avoiding the segfault due to coll/ml >disqualifying itself? Or did it make everything work okay again? > > >On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet ><gilles.gouaillar...@iferc.org>

[OMPI devel] f08 bindings and weak symbols

2014-09-05 Thread Gilles Gouaillardet
Folks, when OpenMPI is configured with --disable-weak-symbols and a fortran 2008 capable compiler (e.g. gcc 4.9), MPI_STATUSES_IGNORE invoked from Fortran is not correctly interpreted as it should. /* instead of being a special array of statuses, it is an array of one status, which can lead to

[OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
Ralph and Brice, i noted Ralph commited r32685 in order to fix a problem with Intel compilers. The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok for me) imho, the root cause is in the hwloc configure. in this case, configure fails to detect strncasecmp is part of the C

Re: [OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
l I can say is that > "tolower" on my CentOS box is defined in , and that has to be > included in the misc.h header. > > > On Sep 8, 2014, at 5:49 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Ralph and Brice, >> >&g

[OMPI devel] race condition in grpcomm/rcd

2014-09-09 Thread Gilles Gouaillardet
Folks, Since r32672 (trunk), grpcomm/rcd is the default module. the attached spawn.c test program is a trimmed version of the spawn_with_env_vars.c test case from the ibm test suite. when invoked on two nodes : - the program hangs with -np 2 - the program can crash with np > 2 error message is

Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Gilles Gouaillardet
ggouaillardet -> ggouaillardet On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote: > As the next step of the planned migration to Github, I need to know: > > - Your Github ID (so that you can be added to the new OMPI git repo) > - Your SVN ID (so that I can map SVN->Github IDs, and therefore map

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
and for each of them to establish a >> persistent receive. They then can use the signature to tell which collective >> the incoming message belongs to. >> >> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot. >> >> >> On Sep 9, 2014

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
ely not the right fix, it was very lightly tested, but so far, it works for me ... Cheers, Gilles On 2014/09/11 16:11, Gilles Gouaillardet wrote: > Ralph, > > things got worst indeed :-( > > now a simple hello world involving two hosts hang in mpi_init. > there is still a race conditio

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
, 2014, at 4:02 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Ralph, >> >> the root cause is when the second orted/mpirun runs rcd_finalize_coll, >> it does not invoke pmix_server_release >> because allgather_stub was not previou

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-12 Thread Gilles Gouaillardet
and 3 enter the allgather at the send time, they will sent a message to each other at the same time and rml fails establishing the connection. i could not find whether this is linked to my changes... Cheers, Gilles > > On Sep 11, 2014, at 5:23 PM, Gilles Gouaillardet < > gilles.gouai

Re: [OMPI devel] coll ml error with some nonblocking collectives

2014-09-15 Thread Gilles Gouaillardet
Howard, and Rolf, i initially reported the issue at http://www.open-mpi.org/community/lists/devel/2014/09/15767.php r32659 is not a fix nor a regression, it simply aborts instead of OBJ_RELEASE(mpi_comm_world). /* my point here is we should focus on the root cause and not the consequence */

[OMPI devel] race condition in oob/tcp

2014-09-16 Thread Gilles Gouaillardet
Ralph, here is the full description of a race condition in oob/tcp i very briefly mentionned in a previous post : the race condition can occur when two not connected orted try to send a message to each other for the first time and at the same time. that can occur when running mpi helloworld on

Re: [OMPI devel] race condition in oob/tcp

2014-09-17 Thread Gilles Gouaillardet
then have the higher vpid retry while the lower one waits. > The logic for that was still in place, but it looks like you are hitting a > different code path, and I found another potential one as well. So I think I > plugged the holes, but will wait to hear if you confirm. >

Re: [OMPI devel] race condition in oob/tcp

2014-09-18 Thread Gilles Gouaillardet
t triggers it so I > can continue debugging > > Ralph > > On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > >> Thanks Ralph, >> >> this is much better but there is still a bug : >> with the very same scen

[OMPI devel] RFC: remove the --with-threads configure option

2014-09-18 Thread Gilles Gouaillardet
Folks, for both trunk and v1.8 branch, configure takes the --with-threads option. valid usages are --with-threads, --with-threads=yes, --with-threads=posix and --with-threads=no /* v1.6 used to support the --with-threads=solaris */ if we try to configure with --with-threads=no, this will result

[OMPI devel] v1.8 does not compile any more

2014-09-19 Thread Gilles Gouaillardet
Folks, r32716 broke v1.8 :-( the root cause it it included MCA_BASE_VAR_TYPE_VERSION_STRING which has not yet landed into v1.8 the attached trivial patch fixes this issue Can the RM/GK please review it and apply it ? Cheers, Gilles Index: opal/mca/base/mca_base_var.c

Re: [OMPI devel] race condition in oob/tcp

2014-09-19 Thread Gilles Gouaillardet
moved from MCA_OOB_TCP_CONNECT_ACK to MCA_OOB_TCP_CLOSED, retry() should have been invoked ? Cheers, Gilles On 2014/09/18 17:02, Ralph Castain wrote: > The patch looks fine to me - please go ahead and apply it. Thanks! > > On Sep 17, 2014, at 11:35 PM, Gilles Gouaillardet > <gilles.goua

Re: [OMPI devel] race condition in oob/tcp

2014-09-21 Thread Gilles Gouaillardet
Thanks for the pointer George ! On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca wrote: > Or copy the handshake protocol design of the TCP BTL... > > the main difference between oob/tcp and btl/tcp is the way we resolve the situation in which two processes send their first

Re: [OMPI devel] race condition in oob/tcp

2014-09-22 Thread Gilles Gouaillardet
o ahead, and thanks > > On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Thanks for the pointer George ! > > On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu> > wrote: > >> Or copy the han

Re: [OMPI devel] RFC: "v1.9.0" (vs. "v1.9")

2014-09-22 Thread Gilles Gouaillardet
Folks, if i read between the lines, it looks like the next stable branch will be v2.0 and not v1.10 is there a strong reason for that (such as ABI compatibility will break, or a major but internal refactoring) ? /* other than v1.10 is less than v1.8 when comparing strings :-) */ Cheers, Gilles

Re: [OMPI devel] Conversion to GitHub: POSTPONED

2014-09-24 Thread Gilles Gouaillardet
my 0.02 US$ ... Bitbucket pricing model is per user (but with free public/private repository up to 5 users) whereas github pricing is per *private* repository (and free public repository and with unlimited users) from an OpenMPI point of view, this means : - with github, only the private

Re: [OMPI devel] race condition in oob/tcp

2014-09-26 Thread Gilles Gouaillardet
dition vis 1.8 - I agree it > is not a blocker for that release. > > Ralph > > On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > >> Ralph, >> >> here is the patch i am using so far. >> i will res

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Gilles Gouaillardet
Nathan, why not just make the topology information available at that point as you described it ? the attached patch does this, could you please review it ? Cheers, Gilles On 2014/09/26 2:50, Nathan Hjelm wrote: > On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >> I finally

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Gilles Gouaillardet
oiding changing > anything in topo for 1.8. > > -Nathan > > On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: >>Nathan, >> >>why not just make the topology information available at that point as you >>described it ? >> &

[OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Gilles Gouaillardet
Folks, the dynamic/spawn test from the ibm test suite crashes if the openib btl is detected (the test can be ran on one node with an IB port) here is what happens : in mca_btl_openib_proc_create, the macro OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,

Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-10-01 Thread Gilles Gouaillardet
Thanks Ralph ! it did fix the problem Cheers, Gilles On 2014/10/01 3:04, Ralph Castain wrote: > I fixed this in r32818 - the components shouldn't be passing back success if > the requested info isn't found. Hope that fixes the problem. > > > On Sep 30, 2014, at 1:54 AM, Gill

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-10-01 Thread Gilles Gouaillardet
tive >> selection time for either graph or dist graph. >> >> -Nathan >> >> On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote: >>> Nathan, >>> >>> here is a revision of the previously attached patch, and that supports &

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-02 Thread Gilles Gouaillardet
Hi Jeff, thumbs up for the migration ! the names here are the CMR owners ('Owned by' field in TRAC) should it be the duty of the creators ('Reported by' field in TRAC) to re-create the CMR instead? /* if not, and from a git log point of view, that means the commiter will be the reviewer and not

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
e those CMRs as pull requests; probably in some >> cases it's the reporter, probably in some cases it's the owner. :-) >> >> >> On Oct 2, 2014, at 6:33 AM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >>> Hi Jeff, >&g

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
Jeff, i could not find how to apply a label to a PR via the web interface (and i am not sure i can even do that since authority might be required) any idea (maybe a special keyword in the comment ...) ? Cheers, Gilles On 2014/10/03 1:53, Jeff Squyres (jsquyres) wrote: > On Oct 2, 2014, at

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
lestone, and assign the PR to someone, > *after* you create the PR (same with creating issues). > > See https://github.com/open-mpi/ompi/wiki/SubmittingBugs for details: > > > > > > > On Oct 2, 2014, at 11:37 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.or

Re: [OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
On Fri, Oct 3, 2014 at 7:29 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Oct 2, 2014, at 11:33 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > the most painful part is probably to manually retrieve the git commit id > > of

Re: [OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
ignees. > >All the OMPI devs have *read* access to ompi-release, meaning you can >create/comment on issues, but not set labels/milestones/assignees. > >I did not expect this behavior. Grumble. Will have to think about it a bit... > > > > >On Oct 3, 2014, at 7:07 AM, Gi

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-03 Thread Gilles Gouaillardet
will do ! Gilles "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >That's a possibility. IU could probably host this for us. > >Would you mind looking into how hard this would be? > > >On Oct 3, 2014, at 8:41 AM, Gilles Gouaillardet ><gilles.

[OMPI devel] ompi github repository is NOT up to date

2014-10-05 Thread Gilles Gouaillardet
Folks, currently, https://github.com/open-mpi/ompi last commit was 13 days ago (see attached snapshot) this is not the most up to date state ! for example, the last commit of my clone is commit 54c839a970fc3025a08fe1c04b7d4b9767078264 Merge: dee6b63 5c5453b Author: Gilles Gouaillardet

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-05 Thread Gilles Gouaillardet
ant option, there are some more political implications (who manage/update/monitor/secure this). the second option (cron script) could be accepted more easily by IU. i will experiment on my sandbox from now. Cheers, Gilles On 2014/10/03 22:20, Gilles Gouaillardet wrote: > will do ! > > Gille

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
sandbox from now. > > Cheers, > > Gilles > > On 2014/10/03 22:20, Gilles Gouaillardet wrote: >> will do ! >> >> Gilles >> >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >>> That's a possibility. IU could

Re: [OMPI devel] OMPI devel] OMPI devel] OMPI@GitHub: (Mostly) Open for business

2014-10-07 Thread Gilles Gouaillardet
wrote: > On Oct 7, 2014, at 6:57 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > > > so far, using webhooks looks really simple :-) > > Good! > > > a public web server (apache+php) that can > > a) process json requests > > b)

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-15 Thread Gilles Gouaillardet
ose revisions listed above that are new to this repository have >> not appeared on any other notification email; so we list those >> revisions in full, below. >> >> - Log ----- >> https://github.com/open-

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-16 Thread Gilles Gouaillardet
Ralph, let me quickly reply about this one : On 2014/10/16 12:00, Ralph Castain wrote: > I also don't understand some of the changes in this commit. For example, why > did you replace the OPAL_MODEX_SEND_STRING macro with essentially a > hard-coded replica of that macro? OPAL_MODEX_SEND_STRING

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-102-gc9c5d40

2014-10-16 Thread Gilles Gouaillardet
OK, revert done : commit b5aea782cec116af095a7e7a7310e9e2a018 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 12:24:38 2014 +0900 Revert "Fix heterogeneous support" Per the discussion at http://

Re: [OMPI devel] Slurm direct-launch is broken on trunk

2014-10-16 Thread Gilles Gouaillardet
9b9 Author: Gilles Gouaillardet <gilles.gouaillar...@iferc.org> List-Post: devel@lists.open-mpi.org Date: Thu Oct 16 13:29:32 2014 +0900 pmi/s1: fix large keys do not overwrite the PMI key when pushing a message that does not fit within 255 bytes diff --git a/opal/mca/pmix/base/p

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Gilles Gouaillardet
Artem, There is a known issue #235 with modex and i made PR #238 with a tentative fix. Could you please give it a try and reports if it solves your problem ? Cheers Gilles Artem Polyakov wrote: >Hello, I have troubles with latest trunk if I use PMI1. > > >For example, if

  1   2   3   4   5   6   7   8   >