Re: [OMPI devel] ompi-tests SVN repo has been moved to Github

2014-09-16 Thread Jeff Squyres (jsquyres)
Two things:

1. Ensure to "git pull --rebase" on your MTT to get the latest updates to MTT.

2. Update your INI files something like this (subset of a patch):

[Test get: ibm]
module = SCM
-scm_module = SVN
-scm_url = svn://savbu-usnic.cisco.com/ompi-tests/trunk/ibm
+scm_module = Git
+scm_url = /home/mpiteam/mirror-git/ompi-tests
+scm_subdir = ibm

Note the changes:
- SVN -> Git
- URL change (notice that I happen to have a local cache of the git repo, but 
you can put in a git:// or https:// URL here, too)
- new field: scm_subdir.  In SVN, we could just check out the "trunk/ibm" part 
of the SVN repo.  But you can only clone the entire repo in Git, so the new 
field specifies the subdir you want MTT to operate in inside the clone.  Hence, 
we specify "ibm" here.

Feel free to look at ompi-tests/cisco/community/*ini for some examples.




On Sep 16, 2014, at 3:41 PM, Jeff Squyres (jsquyres)  wrote:

> Good question.  Info coming shortly (i.e., I'm updating mine right now; will 
> share the results).
> 
> 
> On Sep 16, 2014, at 3:28 PM, Ralph Castain  wrote:
> 
>> And we need to modify our .ini scriptshow?
>> 
>> On Sep 16, 2014, at 12:22 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>>> The ompi-tests SVN repo has been moved to Github.  The SVN repo is now in 
>>> read-only mode.
>>> 
>>> Just like the SVN ompi-tests repo, the Github ompi-tests repo is private.  
>>> You need to be an active organization in Open MPI to be able to access it.  
>>> You can use your developer Github ID or you can request the read-only 
>>> account password from me (e.g., if you were having MTT use the "ompi-tests" 
>>> account for SVN access).
>>> 
>>> You are strongly encouraged to update your MTT to fetch from the new Github 
>>> repo:
>>> 
>>>  https://github.com/open-mpi/ompi-tests
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15847.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15848.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15850.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] PSM MTl work with srun launch?

2014-09-16 Thread Ralph Castain
Odd - it used to do the right thing. Looking at the code, it certainly looks 
like it is doing the right thing.

It's using the PMI id as for the first part of the key, and the stepid as the 
second part. One possibility we've seen with Slurm is that PMI_Init can return 
true, but actually fail. If that happens, they may not be getting a valid PMI 
id.

One easy way to check the setting of the key is just to run a simple program 
that calls MPI_Init and then does a printenv.


On Sep 16, 2014, at 2:24 PM, Pritchard Jr., Howard  wrote:

> Hi Folks,
>  
> I’m getting questions about the 1.8.2 and mtl psm and slurm-direct (srun)
> launch.
>  
> User is hitting the problem that the global id is not being set.
>  
> I’ve suggested for now that the user just set the 
> OMPI_MCA_orte_precondition_transports
> environment variable for now to something like
>  
> export 
> OMPI_MCA_orte_precondition_transports="efa1-9c43"
>  
> perhaps generating the first field via the value of SLURM_JOBID
>  
> In older versions is an attempt made to fall back to ibverbs btl when using 
> srun for
> job launch?
>  
> Howard
>  
>  
> -
> Howard Pritchard
> HPC-5
> Los Alamos National Laboratory
>  
>  
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15851.php



[OMPI devel] PSM MTl work with srun launch?

2014-09-16 Thread Pritchard Jr., Howard
Hi Folks,

I'm getting questions about the 1.8.2 and mtl psm and slurm-direct (srun)
launch.

User is hitting the problem that the global id is not being set.

I've suggested for now that the user just set the 
OMPI_MCA_orte_precondition_transports
environment variable for now to something like

export OMPI_MCA_orte_precondition_transports="efa1-9c43"

perhaps generating the first field via the value of SLURM_JOBID

In older versions is an attempt made to fall back to ibverbs btl when using 
srun for
job launch?

Howard


-
Howard Pritchard
HPC-5
Los Alamos National Laboratory




Re: [OMPI devel] ompi-tests SVN repo has been moved to Github

2014-09-16 Thread Jeff Squyres (jsquyres)
Good question.  Info coming shortly (i.e., I'm updating mine right now; will 
share the results).


On Sep 16, 2014, at 3:28 PM, Ralph Castain  wrote:

> And we need to modify our .ini scriptshow?
> 
> On Sep 16, 2014, at 12:22 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> The ompi-tests SVN repo has been moved to Github.  The SVN repo is now in 
>> read-only mode.
>> 
>> Just like the SVN ompi-tests repo, the Github ompi-tests repo is private.  
>> You need to be an active organization in Open MPI to be able to access it.  
>> You can use your developer Github ID or you can request the read-only 
>> account password from me (e.g., if you were having MTT use the "ompi-tests" 
>> account for SVN access).
>> 
>> You are strongly encouraged to update your MTT to fetch from the new Github 
>> repo:
>> 
>>   https://github.com/open-mpi/ompi-tests
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15847.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15848.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Git: open-mpi/ompi-tests branch master updated. c2134f3ed25d0b327f631365b6f5f60f4ef7140d

2014-09-16 Thread gitdub
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "open-mpi/ompi-tests".

The branch, master has been updated
   via  c2134f3ed25d0b327f631365b6f5f60f4ef7140d (commit)
  from  cd84ff8722f2aee4107f7046592e12552bf38441 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
https://github.com/open-mpi/ompi-tests/commit/c2134f3ed25d0b327f631365b6f5f60f4ef7140d

commit c2134f3ed25d0b327f631365b6f5f60f4ef7140d
Author: Jeff Squyres 
List-Post: devel@lists.open-mpi.org
Date:   Tue Sep 16 12:10:42 2014 -0700

.gitignore: first version

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..8446664
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,935 @@
+.libs
+.deps
+.libs
+.svn
+*.la
+*.lo
+*.o
+*.so
+*.a
+.dirstamp
+*.dSYM
+*.S
+*.loT
+*.orig
+*.rej
+*.class
+*.xcscheme
+*.plist
+*~
+*\\#
+
+Makefile
+Makefile.in
+
+b_eff/b_eff.short
+b_eff/b_eff.tex
+b_eff/b_eff
+b_eff/b_eff.prot
+b_eff/b_eff.sum
+b_eff/b_eff.gps
+b_eff/b_eff.plot
+
+cxx-test-suite/configure
+cxx-test-suite/config.in
+cxx-test-suite/config.log
+cxx-test-suite/config.status
+cxx-test-suite/autom4te.cache
+cxx-test-suite/aclocal.m4
+cxx-test-suite/config/depcomp
+cxx-test-suite/config/missing
+cxx-test-suite/config/config.guess
+cxx-test-suite/config/config.sub
+cxx-test-suite/config/ltmain.sh
+cxx-test-suite/config/install-sh
+cxx-test-suite/src/mpi2c++_test
+cxx-test-suite/src/mpi2c++_dynamics_test
+cxx-test-suite/src/test_config.h
+cxx-test-suite/src/test_config.h.in
+cxx-test-suite/src/stamp-h1
+cxx-test-suite/src/connect
+
+ibm/aclocal.m4
+ibm/config.log
+ibm/config.status
+ibm/configure
+ibm/autom4te.cache
+ibm/libtool
+ibm/collective/allgather
+ibm/collective/allgatherv
+ibm/collective/allreduce
+ibm/collective/alltoall
+ibm/collective/alltoallvw_zero
+ibm/collective/alltoallv_somezeros
+ibm/collective/alltoallw
+ibm/collective/barrier
+ibm/collective/bcast
+ibm/collective/bcast_struct
+ibm/collective/exscan
+ibm/collective/gather
+ibm/collective/op
+ibm/collective/reduce
+ibm/collective/reduce_scatter
+ibm/collective/scan
+ibm/collective/scatter
+ibm/collective/struct_gatherv
+ibm/collective/*.log
+ibm/collective/*.trs
+ibm/collective/*.mod
+ibm/collective/reduce_scatter_in_place
+ibm/collective/reduce_big
+ibm/collective/reduce_loc
+ibm/collective/allreduce_in_place
+ibm/collective/allgather_in_place
+ibm/collective/scan_in_place
+ibm/collective/gatherv
+ibm/collective/reduce_in_place
+ibm/collective/gather_in_place
+ibm/collective/allgatherv_in_place
+ibm/collective/scatterv_in_place
+ibm/collective/gatherv_in_place
+ibm/collective/scatterv
+ibm/collective/scatter_in_place
+ibm/collective/exscan_in_place
+ibm/collective/reduce-complex-c
+ibm/collective/bcast_f08
+ibm/collective/ibcast_f08
+ibm/collective/ibarrier_f08
+ibm/collective/ibarrier_f90
+ibm/collective/reduce-complex
+ibm/collective/ibarrier
+ibm/collective/ibcast_f90
+ibm/collective/ibarrier_f
+ibm/collective/ibcast
+ibm/collective/ibcast_f
+ibm/collective/reduce_scatter_block
+ibm/collective/ireduce_in_place
+ibm/collective/ireduce_loc
+ibm/collective/igatherv_in_place
+ibm/collective/iallreduce_in_place
+ibm/collective/iscatterv
+ibm/collective/iscatter_in_place
+ibm/collective/iscatterv_in_place
+ibm/collective/ireduce-complex-c
+ibm/collective/ialltoall
+ibm/collective/ireduce
+ibm/collective/iscan
+ibm/collective/igatherv
+ibm/collective/ibcast_struct
+ibm/collective/iallreduce
+ibm/collective/igather_in_place
+ibm/collective/ireduce_scatter_block
+ibm/collective/igather
+ibm/collective/iallgatherv_in_place
+ibm/collective/ireduce_scatter
+ibm/collective/iallgather
+ibm/collective/iscan_in_place
+ibm/collective/ireduce_big
+ibm/collective/iallgatherv
+ibm/collective/iallgather_in_place
+ibm/collective/istruct_gatherv
+ibm/collective/ireduce_scatter_in_place
+ibm/collective/iscatter
+ibm/collective/alltoall_in_place
+ibm/collective/alltoallw_in_place
+ibm/collective/iexscan
+ibm/collective/iexscan_in_place
+ibm/collective/ineighbor_allgather
+ibm/collective/ineighbor_allgatherv
+ibm/collective/ineighbor_alltoall
+ibm/collective/ineighbor_alltoallv
+ibm/collective/ineighbor_alltoallw
+ibm/collective/neighbor_allgather
+ibm/collective/neighbor_allgatherv
+ibm/collective/neighbor_alltoall
+ibm/collective/neighbor_alltoallv
+ibm/collective/neighbor_alltoallw
+ibm/collective/op_mpifh
+ibm/collective/op_usempi
+ibm/collective/op_usempif08
+ibm/collective/optest.*
+ibm/collective/intercomm/allgather_inter
+ibm/collective/intercomm/allreduce_inter
+ibm/collective/intercomm/alltoall_inter
+ibm/collective/intercomm/alltoallv_inter
+ibm/collective/intercomm/alltoallw_inter
+ibm/collective/intercomm/barrier_inter
+ibm/collective/intercomm/bcast_inter

Re: [OMPI devel] ompi-tests SVN repo has been moved to Github

2014-09-16 Thread Ralph Castain
And we need to modify our .ini scriptshow?

On Sep 16, 2014, at 12:22 PM, Jeff Squyres (jsquyres)  
wrote:

> The ompi-tests SVN repo has been moved to Github.  The SVN repo is now in 
> read-only mode.
> 
> Just like the SVN ompi-tests repo, the Github ompi-tests repo is private.  
> You need to be an active organization in Open MPI to be able to access it.  
> You can use your developer Github ID or you can request the read-only account 
> password from me (e.g., if you were having MTT use the "ompi-tests" account 
> for SVN access).
> 
> You are strongly encouraged to update your MTT to fetch from the new Github 
> repo:
> 
>https://github.com/open-mpi/ompi-tests
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15847.php



[OMPI devel] ompi-tests SVN repo has been moved to Github

2014-09-16 Thread Jeff Squyres (jsquyres)
The ompi-tests SVN repo has been moved to Github.  The SVN repo is now in 
read-only mode.

Just like the SVN ompi-tests repo, the Github ompi-tests repo is private.  You 
need to be an active organization in Open MPI to be able to access it.  You can 
use your developer Github ID or you can request the read-only account password 
from me (e.g., if you were having MTT use the "ompi-tests" account for SVN 
access).

You are strongly encouraged to update your MTT to fetch from the new Github 
repo:

https://github.com/open-mpi/ompi-tests

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-16 Thread Ralph Castain
I took care of it - thanks!

On Sep 16, 2014, at 7:36 AM, tmish...@jcity.maeda.co.jp wrote:

> Gilles,
> 
> Your patch looks good to me and I think this issue should be fixed
> in the upcoming openmpi-1.8.3. Could you commit it to the trunk and
> create a CMR for it?
> 
> Tetsuya
> 
>> Mishima-san,
>> 
>> the root cause is macro expansion does not always occur as one would
>> have expected ...
>> 
>> could you please give a try to the attached patch ?
>> 
>> it compiles (at least with gcc) and i made zero tests so far 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
>>> Hi folks,
>>> 
>>> I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran
> int)
>>> option
>>> as shown below:
>>> 
>>> ./configure \
>>> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
>>> --enable-abi-breaking-fortran-status-i8-fix \
>>> --with-tm \
>>> --with-verbs \
>>> --disable-ipv6 \
>>> CC=pgcc CFLAGS="-tp k8-64e -fast" \
>>> CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
>>> F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
>>> FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
>>> 
>>> Then I saw this compile error in making oshmem at the last stage:
>>> 
>>> if test ! -r pshmem_real8_swap_f.c ; then \
>>>pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
>>>ln -s ../../../../oshmem/shmem/fortran/$pname
>>> pshmem_real8_swap_f.c ; \
>>>fi
>>>  CC   pshmem_real8_swap_f.lo
>>> if test ! -r pshmem_int4_cswap_f.c ; then \
>>>pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
>>>ln -s ../../../../oshmem/shmem/fortran/$pname
>>> pshmem_int4_cswap_f.c ; \
>>>fi
>>>  CC   pshmem_int4_cswap_f.lo
>>> PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
>>> PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
>>> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
>>> make[3]: Leaving directory
>>> 
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'
> 
>>> make[2]: *** [all-recursive] Error 1
>>> make[2]: Leaving directory
>>> 
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'
> 
>>> make[1]: *** [all-recursive] Error 1
>>> make[1]: Leaving directory
>>> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
>>> make: *** [all-recursive] Error 1
>>> 
>>> I confirmed that it worked if I added configure option of
> --disable-oshmem.
>>> So, I hope that oshmem experts would fix this problem.
>>> 
>>> (additional note)
>>> I switched to use gnu compiler and checked with this configuration,
> then
>>> I got the same error:
>>> 
>>> ./configure \
>>> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
>>> --enable-abi-breaking-fortran-status-i8-fix \
>>> --disable-ipv6 \
>>> F77=gfortran \
>>> FC=gfortran \
>>> CC=gcc \
>>> CXX=g++ \
>>> FFLAGS="-m64 -fdefault-integer-8" \
>>> FCFLAGS="-m64 -fdefault-integer-8" \
>>> CFLAGS=-m64 \
>>> CXXFLAGS=-m64
>>> 
>>> make
>>> 
>>> if test ! -r pshmem_int4_cswap_f.c ; then \
>>>pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
>>>ln -s ../../../../oshmem/shmem/fortran/$pname
>>> pshmem_int4_cswap_f.c ; \
>>>fi
>>>  CC   pshmem_int4_cswap_f.lo
>>> pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
>>> pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
>>> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
>>> 
>>> Regards
>>> Tetsuya Mishima
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15764.php
>> 
>> - oshmem.i8.patch___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives:
> http://www.open-mpi.org/community/lists/devel/2014/09/index.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15838.php



Re: [OMPI devel] 1.8.3rc1 - start your engines

2014-09-16 Thread Ralph Castain
Thanks!

On Sep 16, 2014, at 11:40 AM, Paul Hargrove  wrote:

> The ARM results finished a couple days back and the MIPS results (3 ABIs to 
> test) finally completed over night.
> In the meantime I was able to schedule tests of most of my menagerie of 
> Intel, PGI, Sun, Pathscale, and Open64 compilers on x86-64, and some IBM 
> compiler tests on PPC64 (but *NOT* yet the latest compiler release available 
> to me)
> 
> Other then the known issues with various compilers (such need to explicitly 
> disable F08 bindings with some PGI versions) there were no problems found in 
> 1.8.3rc1.
> 
> There may be some results later for the IBM compiler I didn't get to, and 
> possibly for Clang on Linux.
> 
> -Paul
> 
> On Sun, Sep 14, 2014 at 8:55 PM, Ralph Castain  wrote:
> Your contributions are always appreciated, Paul - thanks!
> 
> On Sep 13, 2014, at 7:51 PM, Paul Hargrove  wrote:
> 
>> Ralph,
>> 
>> I am not sure if I will have time to run my full suite of configurations, 
>> including all the PGI, Sun, Intel and IBM compilers on Linux.
>> 
>> However, the following non-(Linux/x86-64) platforms have passed:
>> 
>> + Linux/{PPC32,PPC64,IA64}
>> + Solaris-10/{SPARC-v8+,SPARC-v9} (Oracle and GNU compilers)
>> + Solaris-11/{amd64,i386} (Oracle and GNU compilers)
>> + NetBSD-6/{amd64,i386}
>> + OpenBSD-5/{amd64,i386}
>> + FreeBSD-10/{amd64,i386}
>> 
>> I've started runs on my ARM and MIPS Linux systems, but those results will 
>> take a while.
>> 
>> -Paul
>> 
>> On Sat, Sep 13, 2014 at 11:23 AM, Ralph Castain  wrote:
>> Hi folks
>> 
>> Time to start the release process with rc1 - please test and report issues:
>> 
>> http://www.open-mpi.org/software/ompi/v1.8/
>> 
>> Ralph
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15822.php
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15823.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15826.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15841.php



Re: [OMPI devel] CONVERSION TO GITHUB

2014-09-16 Thread Paul Hargrove
Jeff,

So the instructions from your reply is "create a github account if you wish
to continue filing tickets".

But don't you want/need the trac->github account mapping now to convert
existing tickets?
For instance, I am "phargrov" in your Trac, but "PHHargrove" at github.

And by the way, on wiki page
https://github.com/open-mpi/ompi/wiki/SubmittingBugs you might consider
adding a link to the issue tracker, for folks not familiar with github
navigation .

-Paul

On Tue, Sep 16, 2014 at 11:47 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Not really.
>
> One minor point: you'll need a Github account to file Github issues (i.e.,
> what's replacing Trac tickets) and/or use the code commenting tools.
>
>
>
> On Sep 16, 2014, at 2:33 PM, Paul Hargrove  wrote:
>
> > Jeff,
> >
> > Any instructions for those who have never had Subversion accounts, but
> do have Trac accounts?
> > You know... the people like me who primarily just make work for others
> :-)
> >
> > -Paul
> >
> > On Tue, Sep 16, 2014 at 10:34 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Short version
> > =
> >
> > - I have added / invited all users to the "ompi" Github repo.
> >   *** You need to join and then "unwatch" the "ompi" repo on Github ASAP
> ***
> >
> > - The github migration is planned for *next* Wednesday: 24 Sep, 2014
> >   ALL OMPI ACTIVITY MUST STOP THAT DAY: commits, tickets, wiki
> >
> > - Go read the new OMPI wik pages about Git / Github.  They talk about
> how we're going to use Git/Github, etc.  Please reply here with comments,
> suggestions, questions, etc.:
> >
> >   https://github.com/open-mpi/ompi/wiki
> >
> > More detail
> > ===
> >
> > The Github migration has been planned for Wednesday, 24 Sep 2014.  The
> migration will start at 8am US Eastern time, and will take all day.
> Subversion and Trac will be placed into read-only status at the beginning
> of the migration.
> >
> > *** Please reply ASAP if this date does not work for you.
> >
> > *** If you're a current OMPI developer, you MUST join and "unwatch" the
> "ompi" repo before the migration date (i.e., go to
> https://github.com/open-mpi/ompi/ and click the "Unwatch" button in the
> top right and select "Ignoring").  If you don't join, you make the
> migration harder for me (please don't do that).  If you don't "unwatch",
> you will get a ZILLION emails when the migration actually occurs.  YOU HAVE
> BEEN WARNED.
> >
> > There's much more information about the Github migration on this wiki
> page:
> >
> >  https://github.com/open-mpi/ompi/wiki/GithubMigration
> >
> > Go read it.  GO READ IT NOW.
> >
> > I will send out an "all clear" email next Wednesday when Github is ready
> to use.  At that point, it will be safe (and recommended) to start Watching
> the "ompi" repo again.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15839.php
> >
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Future Technologies Group
> > Computer and Data Sciences Department Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15840.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15842.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] CONVERSION TO GITHUB

2014-09-16 Thread Jeff Squyres (jsquyres)
Not really.

One minor point: you'll need a Github account to file Github issues (i.e., 
what's replacing Trac tickets) and/or use the code commenting tools.



On Sep 16, 2014, at 2:33 PM, Paul Hargrove  wrote:

> Jeff,
> 
> Any instructions for those who have never had Subversion accounts, but do 
> have Trac accounts?
> You know... the people like me who primarily just make work for others :-)
> 
> -Paul
> 
> On Tue, Sep 16, 2014 at 10:34 AM, Jeff Squyres (jsquyres) 
>  wrote:
> Short version
> =
> 
> - I have added / invited all users to the "ompi" Github repo.
>   *** You need to join and then "unwatch" the "ompi" repo on Github ASAP ***
> 
> - The github migration is planned for *next* Wednesday: 24 Sep, 2014
>   ALL OMPI ACTIVITY MUST STOP THAT DAY: commits, tickets, wiki
> 
> - Go read the new OMPI wik pages about Git / Github.  They talk about how 
> we're going to use Git/Github, etc.  Please reply here with comments, 
> suggestions, questions, etc.:
> 
>   https://github.com/open-mpi/ompi/wiki
> 
> More detail
> ===
> 
> The Github migration has been planned for Wednesday, 24 Sep 2014.  The 
> migration will start at 8am US Eastern time, and will take all day.  
> Subversion and Trac will be placed into read-only status at the beginning of 
> the migration.
> 
> *** Please reply ASAP if this date does not work for you.
> 
> *** If you're a current OMPI developer, you MUST join and "unwatch" the 
> "ompi" repo before the migration date (i.e., go to 
> https://github.com/open-mpi/ompi/ and click the "Unwatch" button in the top 
> right and select "Ignoring").  If you don't join, you make the migration 
> harder for me (please don't do that).  If you don't "unwatch", you will get a 
> ZILLION emails when the migration actually occurs.  YOU HAVE BEEN WARNED.
> 
> There's much more information about the Github migration on this wiki page:
> 
>  https://github.com/open-mpi/ompi/wiki/GithubMigration
> 
> Go read it.  GO READ IT NOW.
> 
> I will send out an "all clear" email next Wednesday when Github is ready to 
> use.  At that point, it will be safe (and recommended) to start Watching the 
> "ompi" repo again.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15839.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15840.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.8.3rc1 - start your engines

2014-09-16 Thread Paul Hargrove
The ARM results finished a couple days back and the MIPS results (3 ABIs to
test) finally completed over night.
In the meantime I was able to schedule tests of most of my menagerie of
Intel, PGI, Sun, Pathscale, and Open64 compilers on x86-64, and some IBM
compiler tests on PPC64 (but *NOT* yet the latest compiler release
available to me)

Other then the known issues with various compilers (such need to explicitly
disable F08 bindings with some PGI versions) there were no problems found
in 1.8.3rc1.

There may be some results later for the IBM compiler I didn't get to, and
possibly for Clang on Linux.

-Paul

On Sun, Sep 14, 2014 at 8:55 PM, Ralph Castain  wrote:

> Your contributions are always appreciated, Paul - thanks!
>
> On Sep 13, 2014, at 7:51 PM, Paul Hargrove  wrote:
>
> Ralph,
>
> I am not sure if I will have time to run my full suite of configurations,
> including all the PGI, Sun, Intel and IBM compilers on Linux.
>
> However, the following non-(Linux/x86-64) platforms have passed:
>
> + Linux/{PPC32,PPC64,IA64}
> + Solaris-10/{SPARC-v8+,SPARC-v9} (Oracle and GNU compilers)
> + Solaris-11/{amd64,i386} (Oracle and GNU compilers)
> + NetBSD-6/{amd64,i386}
> + OpenBSD-5/{amd64,i386}
> + FreeBSD-10/{amd64,i386}
>
> I've started runs on my ARM and MIPS Linux systems, but those results will
> take a while.
>
> -Paul
>
> On Sat, Sep 13, 2014 at 11:23 AM, Ralph Castain  wrote:
>
>> Hi folks
>>
>> Time to start the release process with rc1 - please test and report
>> issues:
>>
>> http://www.open-mpi.org/software/ompi/v1.8/
>>
>> Ralph
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/09/15822.php
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>  ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15823.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15826.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] CONVERSION TO GITHUB

2014-09-16 Thread Paul Hargrove
Jeff,

Any instructions for those who have never had Subversion accounts, but do
have Trac accounts?
You know... the people like me who primarily just make work for others :-)

-Paul

On Tue, Sep 16, 2014 at 10:34 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Short version
> =
>
> - I have added / invited all users to the "ompi" Github repo.
>   *** You need to join and then "unwatch" the "ompi" repo on Github ASAP
> ***
>
> - The github migration is planned for *next* Wednesday: 24 Sep, 2014
>   ALL OMPI ACTIVITY MUST STOP THAT DAY: commits, tickets, wiki
>
> - Go read the new OMPI wik pages about Git / Github.  They talk about how
> we're going to use Git/Github, etc.  Please reply here with comments,
> suggestions, questions, etc.:
>
>   https://github.com/open-mpi/ompi/wiki
>
> More detail
> ===
>
> The Github migration has been planned for Wednesday, 24 Sep 2014.  The
> migration will start at 8am US Eastern time, and will take all day.
> Subversion and Trac will be placed into read-only status at the beginning
> of the migration.
>
> *** Please reply ASAP if this date does not work for you.
>
> *** If you're a current OMPI developer, you MUST join and "unwatch" the
> "ompi" repo before the migration date (i.e., go to
> https://github.com/open-mpi/ompi/ and click the "Unwatch" button in the
> top right and select "Ignoring").  If you don't join, you make the
> migration harder for me (please don't do that).  If you don't "unwatch",
> you will get a ZILLION emails when the migration actually occurs.  YOU HAVE
> BEEN WARNED.
>
> There's much more information about the Github migration on this wiki page:
>
>  https://github.com/open-mpi/ompi/wiki/GithubMigration
>
> Go read it.  GO READ IT NOW.
>
> I will send out an "all clear" email next Wednesday when Github is ready
> to use.  At that point, it will be safe (and recommended) to start Watching
> the "ompi" repo again.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15839.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] CONVERSION TO GITHUB

2014-09-16 Thread Jeff Squyres (jsquyres)
Short version
=

- I have added / invited all users to the "ompi" Github repo.
  *** You need to join and then "unwatch" the "ompi" repo on Github ASAP ***

- The github migration is planned for *next* Wednesday: 24 Sep, 2014
  ALL OMPI ACTIVITY MUST STOP THAT DAY: commits, tickets, wiki

- Go read the new OMPI wik pages about Git / Github.  They talk about how we're 
going to use Git/Github, etc.  Please reply here with comments, suggestions, 
questions, etc.:

  https://github.com/open-mpi/ompi/wiki

More detail
===

The Github migration has been planned for Wednesday, 24 Sep 2014.  The 
migration will start at 8am US Eastern time, and will take all day.  Subversion 
and Trac will be placed into read-only status at the beginning of the migration.

*** Please reply ASAP if this date does not work for you.

*** If you're a current OMPI developer, you MUST join and "unwatch" the "ompi" 
repo before the migration date (i.e., go to https://github.com/open-mpi/ompi/ 
and click the "Unwatch" button in the top right and select "Ignoring").  If you 
don't join, you make the migration harder for me (please don't do that).  If 
you don't "unwatch", you will get a ZILLION emails when the migration actually 
occurs.  YOU HAVE BEEN WARNED.

There's much more information about the Github migration on this wiki page:

 https://github.com/open-mpi/ompi/wiki/GithubMigration

Go read it.  GO READ IT NOW.

I will send out an "all clear" email next Wednesday when Github is ready to 
use.  At that point, it will be safe (and recommended) to start Watching the 
"ompi" repo again.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-16 Thread tmishima
Gilles,

Your patch looks good to me and I think this issue should be fixed
in the upcoming openmpi-1.8.3. Could you commit it to the trunk and
create a CMR for it?

Tetsuya

> Mishima-san,
>
> the root cause is macro expansion does not always occur as one would
> have expected ...
>
> could you please give a try to the attached patch ?
>
> it compiles (at least with gcc) and i made zero tests so far 
>
> Cheers,
>
> Gilles
>
> On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran
int)
> > option
> > as shown below:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --with-tm \
> > --with-verbs \
> > --disable-ipv6 \
> > CC=pgcc CFLAGS="-tp k8-64e -fast" \
> > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
> >
> > Then I saw this compile error in making oshmem at the last stage:
> >
> > if test ! -r pshmem_real8_swap_f.c ; then \
> > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_real8_swap_f.c ; \
> > fi
> >   CC   pshmem_real8_swap_f.lo
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> > make[3]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'

> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'

> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> > make: *** [all-recursive] Error 1
> >
> > I confirmed that it worked if I added configure option of
--disable-oshmem.
> > So, I hope that oshmem experts would fix this problem.
> >
> > (additional note)
> > I switched to use gnu compiler and checked with this configuration,
then
> > I got the same error:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --disable-ipv6 \
> > F77=gfortran \
> > FC=gfortran \
> > CC=gcc \
> > CXX=g++ \
> > FFLAGS="-m64 -fdefault-integer-8" \
> > FCFLAGS="-m64 -fdefault-integer-8" \
> > CFLAGS=-m64 \
> > CXXFLAGS=-m64
> >
> > make
> > 
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> >
> > Regards
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15764.php
>
>  - oshmem.i8.patch___
> devel mailing list
> de...@open-mpi.org
> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives:
http://www.open-mpi.org/community/lists/devel/2014/09/index.php



[OMPI devel] OPAL timing framework

2014-09-16 Thread Artem Polyakov
Hello,

I would like to introduce OMPI timing framework that was included into the
trunk yesterday (r32738). The code is new so if you'll hit some bugs - just
let me know.

The framework consists of the set of macro's and routines for internal OMPI
usage + standalone tool mpisync and few additional scripts: mpirun_prof and
ompi_timing_post. The set of features is very basic and I am open for
discussion of new things that are desirable there.

To enable framework compilation you should configure OMPI with
--enable-timing option. If the option was passed to ./configure, standalone
tools and scripts will be installed into /bin.

The timing code is located in OPAL (opal/utils/timing.[ch]). There is a set
of macro's that should be used to preprocess out all mentions of the timing
code in case it wasn't requested with --enable-timing:
OPAL_TIMING_DECLARE(t) - declare timing handler structure with name "t".
OPAL_TIMING_DECLARE_EXT(x, t) - external declaration of a timing handler
"t".
OPAL_TIMING_INIT(t) - initialize timing handler "t"
OPAL_TIMING_EVENT(x) - printf-like event declaration similar to OPAL_OUTPUT.
The information about the event will be quickly inserted into the linked
list. Maximum event description is limited by OPAL_TIMING_DESCR_MAX.
The malloc is performed in buckets (OPAL_TIMING_BUFSIZE at once) and
overhead (time to malloc and prepare the bucket) is accounted in
corresponding list element. It might be excluded from the timing results
(controlled by OMPI_MCA_opal_timing_overhead parameter).
OPAL_TIMING_REPORT(enable, t, prefix) - prepare and print out timing
information. If OMPI_MCA_opal_timing_file was specified the output will go
to that file. In other case the output will be directed using opal_output,
each line will be prefixed with "prefix" to ease grep'ing. "enable" is a
boolean/integer variable that is used for runtime selection of what should
be reported.
OPAL_TIMING_RELEASE(t) - the counterpart for OPAL_TIMING_INIT.

There are several examples in OMPI code. And here is another simple example:
OPAL_TIMING_DECLARE(tm);
OPAL_TIMING_INIT();
...
OPAL_TIMING_EVENT((,"Begin of timing: %s",
ORTE_NAME_PRINT(&(peer->name)) ));

OPAL_TIMING_EVENT((,"Next timing event with condition x = %d", x ));
...
OPAL_TIMING_EVENT((,"Finish"));
OPAL_TIMING_REPORT(enable_var, ,"MPI Init");
OPAL_TIMING_RELEASE();


An output from all OMPI processes (mpirun, orted's, user processes) is
merged together. NTP provides 1 millisecond - 100 microsecond level of
precision. This may not be sufficient to order events globally.
To help developers extract the most realistic picture of what is going on,
additional time synchronisation might be performed before profiling. The
mpisync program should be runned 1-user-process-per-node to acquire the
file with time offsets relative to HNP of each node. If the cluster runs
over Gig Ethernet the precision will be 30-50 microseconds, in case of
Infiniband - 4 microseconds. mpisync produces output file that might be
readed and used by timing framework (OMPI_MCA_opal_clksync_file parameter).
The bad news is that this synchronisation is not enough because of
different clock skew on different nodes. Additional periodical
synchronisation is needed. This is planned for the near future (me and
Ralph discussing possible ways now).

the mpirun_prof & ompi_timing_post script may be used to automate clock
synchronisation in following manner:
export OMPI_MCA_ompi_timing=true
export OMPI_MCA_orte_oob_timing=true
export OMPI_MCA_orte_rml_timing=true
export OMPI_MCA_opal_timing_file=timing.out
mpirun_prof  ./mpiprog
ompi_timing_post timing.out

ompi_timing_post will simply sort the events and made all times to be
relative to the first one.

-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov


[OMPI devel] race condition in oob/tcp

2014-09-16 Thread Gilles Gouaillardet
Ralph,

here is the full description of a race condition in oob/tcp i very briefly
mentionned in a previous post :

the race condition can occur when two not connected orted try to send a
message to each other for the first time and at the same time.

that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd
module.

here is a scenario in which the race condition occurs :

orted vpid 2 and 3 enter the allgather
/* they are not orte yet oob/tcp connected*/
and they call orte.send_buffer_nb each other.
from a libevent point of view, vpid 2 and 3 will call
mca_oob_tcp_peer_try_connect

vpid 2 calls mca_oob_tcp_send_handler

vpid 3 calls connection_event_handler

depending on the value returned by random() in libevent, vpid 3 will
either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
if vpid 3 calls recv_handler, it will close the two sockets to vpid 2

then vpid 2 will call mca_oob_tcp_recv_handler
(peer->state is MCA_OOB_TCP_CONNECT_ACK)
that will invoke mca_oob_tcp_recv_connect_ack
tcp_peer_recv_blocking will fail
/* zero bytes are recv'ed since vpid 3 previously closed the socket before
writing a header */
and this is handled by mca_oob_tcp_recv_handler as a fatal error
/* ORTE_FORCED_TERMINATE(1) */

could you please have a look at it ?

if you are too busy, could you please advise where this scenario should be
handled differently ?
- should vpid 3 keep one socket instead of closing both and retrying ?
- should vpid 2 handle the failure as a non fatal error ?

Cheers,

Gilles