[hwloc-devel] Create success (hwloc git dev-195-gf100263)

2014-09-01 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-195-gf100263
Start time: Mon Sep  1 21:01:02 EDT 2014
End time:   Mon Sep  1 21:02:30 EDT 2014

Your friendly daemon,
Cyrador


Re: [OMPI devel] OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Ralph,

The changeset avoids SIGSEGV by calling mpi_abort before bad things happen.

The attached patch seems to fix the problem (and makes the changeset kind of 
useless).
Once again, the patch was very little tested and might break other parts of 
coll/m.laposte

Cheers,

Gilles

Ralph Castain  wrote:
>Usually we have trouble with coll/ml because the process locality isn't being 
>reported sufficiently for its needs. Given the recent change in data exchange, 
>I suspect that is the root cause here - I have a note to Nathan asking for 
>clarification of the coll/ml locality requirement.
>
>Did this patch "fix" the problem by avoiding the segfault due to coll/ml 
>disqualifying itself? Or did it make everything work okay again?
>
>
>On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet 
> wrote:
>
>> Folks,
>> 
>> mtt recently failed a bunch of times with the trunk.
>> a good suspect is the collective/ibarrier test from the ibm test suite.
>> 
>> most of the time, CHECK_AND_RECYCLE will fail
>> /* IS_COLL_SYNCMEM(coll_op) is true */
>> 
>> with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is
>> called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW)
>> 
>> i commited r32659 in order to :
>> - display an error message
>> - abort if the communicator is an intrincic one
>> 
>> with attached modified version of the ibarrier test, i always get an
>> error on task 0 when invoked with
>> mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier
>> 
>> the modified version adds some sleep(1) in order to work around the race
>> condition and get a reproducible crash
>> 
>> i tried to dig and could not find a correct way to fix this.
>> that being said, i tried the attached ml.patch and it did fix the
>> problem (even with NREQS=1024)
>> i did not commit it since this is very likely incorrect.
>> 
>> could someone have a look ?
>> 
>> Cheers,
>> 
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15767.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/09/15769.php


Re: [OMPI devel] race condition in coll/ml

2014-09-01 Thread Ralph Castain
Usually we have trouble with coll/ml because the process locality isn't being 
reported sufficiently for its needs. Given the recent change in data exchange, 
I suspect that is the root cause here - I have a note to Nathan asking for 
clarification of the coll/ml locality requirement.

Did this patch "fix" the problem by avoiding the segfault due to coll/ml 
disqualifying itself? Or did it make everything work okay again?


On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet  
wrote:

> Folks,
> 
> mtt recently failed a bunch of times with the trunk.
> a good suspect is the collective/ibarrier test from the ibm test suite.
> 
> most of the time, CHECK_AND_RECYCLE will fail
> /* IS_COLL_SYNCMEM(coll_op) is true */
> 
> with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is
> called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW)
> 
> i commited r32659 in order to :
> - display an error message
> - abort if the communicator is an intrincic one
> 
> with attached modified version of the ibarrier test, i always get an
> error on task 0 when invoked with
> mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier
> 
> the modified version adds some sleep(1) in order to work around the race
> condition and get a reproducible crash
> 
> i tried to dig and could not find a correct way to fix this.
> that being said, i tried the attached ml.patch and it did fix the
> problem (even with NREQS=1024)
> i did not commit it since this is very likely incorrect.
> 
> could someone have a look ?
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15767.php



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread tmishima
Gilles,

Thank you for your fix. I successfully compiled it with PGI, although
I could not check it executing actual test run.

Tetsuya

> Mishima-san,
>
> the root cause is macro expansion does not always occur as one would
> have expected ...
>
> could you please give a try to the attached patch ?
>
> it compiles (at least with gcc) and i made zero tests so far 
>
> Cheers,
>
> Gilles
>
> On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran
int)
> > option
> > as shown below:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --with-tm \
> > --with-verbs \
> > --disable-ipv6 \
> > CC=pgcc CFLAGS="-tp k8-64e -fast" \
> > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
> >
> > Then I saw this compile error in making oshmem at the last stage:
> >
> > if test ! -r pshmem_real8_swap_f.c ; then \
> > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_real8_swap_f.c ; \
> > fi
> >   CC   pshmem_real8_swap_f.lo
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> > make[3]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'

> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'

> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> > make: *** [all-recursive] Error 1
> >
> > I confirmed that it worked if I added configure option of
--disable-oshmem.
> > So, I hope that oshmem experts would fix this problem.
> >
> > (additional note)
> > I switched to use gnu compiler and checked with this configuration,
then
> > I got the same error:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --disable-ipv6 \
> > F77=gfortran \
> > FC=gfortran \
> > CC=gcc \
> > CXX=g++ \
> > FFLAGS="-m64 -fdefault-integer-8" \
> > FCFLAGS="-m64 -fdefault-integer-8" \
> > CFLAGS=-m64 \
> > CXXFLAGS=-m64
> >
> > make
> > 
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> >
> > Regards
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15764.php
>
>  - oshmem.i8.patch___
> devel mailing list
> de...@open-mpi.org
> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives:
http://www.open-mpi.org/community/lists/devel/2014/09/index.php



[OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Folks,

mtt recently failed a bunch of times with the trunk.
a good suspect is the collective/ibarrier test from the ibm test suite.

most of the time, CHECK_AND_RECYCLE will fail
/* IS_COLL_SYNCMEM(coll_op) is true */

with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is
called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW)

i commited r32659 in order to :
- display an error message
- abort if the communicator is an intrincic one

with attached modified version of the ibarrier test, i always get an
error on task 0 when invoked with
mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier

the modified version adds some sleep(1) in order to work around the race
condition and get a reproducible crash

i tried to dig and could not find a correct way to fix this.
that being said, i tried the attached ml.patch and it did fix the
problem (even with NREQS=1024)
i did not commit it since this is very likely incorrect.

could someone have a look ?

Cheers,

Gilles
/*
 * $HEADER$
 */
/

 MESSAGE PASSING INTERFACE TEST CASE SUITE

 Copyright IBM Corp. 1995

 IBM Corp. hereby grants a non-exclusive license to use, copy, modify, and
 distribute this software for any purpose and without fee provided that the
 above copyright notice and the following paragraphs appear in all copies.

 IBM Corp. makes no representation that the test cases comprising this
 suite are correct or are an accurate representation of any standard.

 In no event shall IBM be liable to any party for direct, indirect, special
 incidental, or consequential damage arising out of the use of this software
 even if IBM Corp. has been advised of the possibility of such damage.

 IBM CORP. SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING, BUT NOT LIMITED
 TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS AND IBM
 CORP. HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
 ENHANCEMENTS, OR MODIFICATIONS.



 These test cases reflect an interpretation of the MPI Standard.  They are
 are, in most cases, unit tests of specific MPI behaviors.  If a user of any
 test case from this set believes that the MPI Standard requires behavior
 different than that implied by the test case we would appreciate feedback.

 Comments may be sent to:
Richard Treumann
treum...@kgn.ibm.com


*/
#include 
#include 
#include 

#include 

#include "ompitest_error.h"

#ifndef NREQS
#define NREQS 16
#endif


int main(int argc, char** argv)
{
int i, me, rank, tasks;
double t1, t2;
MPI_Request req[NREQS];
MPI_Comm comm;

MPI_Init(,);

ompitest_check_size(__FILE__, __LINE__, 2, 1);

MPI_Comm_rank(MPI_COMM_WORLD, );

MPI_Comm_dup(MPI_COMM_WORLD, );

MPI_Barrier(MPI_COMM_WORLD);
if (rank > 0) sleep(2);

/* Do a bunch of barriers */
for (i = 0; i < NREQS; ++i) {
MPI_Ibarrier(comm, [i]);
}
MPI_Waitall(NREQS, req, MPI_STATUSES_IGNORE);
if (rank > 0) sleep(2);
MPI_Barrier(MPI_COMM_WORLD);

MPI_Finalize();
return 0;
}
Index: ompi/mca/coll/ml/coll_ml_inlines.h
===
--- ompi/mca/coll/ml/coll_ml_inlines.h  (revision 32658)
+++ ompi/mca/coll/ml/coll_ml_inlines.h  (working copy)
@@ -192,7 +192,7 @@
 !out_of_resource) {
 */
 if (((_op->full_message != 
coll_op->fragment_data.message_descriptor) &&
-!out_of_resource) || IS_COLL_SYNCMEM(coll_op)) {
+!out_of_resource)) {
 /* non-zero offset ==> this is not fragment 0 */
 CHECK_AND_RECYCLE(coll_op);
 }


[hwloc-devel] preparing v1.10

2014-09-01 Thread Brice Goglin
Hello,
I am planning to release v1.10 soon. We don't have any new big feature,
just random improvements everywhere, see below. If there's something
important you want now, please speak up quickly.
thanks
Brice



* API
  + hwloc_distrib() does not ignore any objects anymore when there are
too many of them. They get merged with others instead.
Thanks to Tim Creech for reporting the issue.
  + Add hwloc_topology_export_synthetic() to export a topology to a
synthetic string without using lstopo. See the Synthetic topologies
section in the documentation.
  + Add hwloc_topology_set/get_userdata() to let the application save
a private pointer in the topology whenever it needs a way to find
its own object corresponding to a topology.
* Tools
  + hwloc-bind --get  now executes the command after displaying
the binding instead of ignoring the command entirely.
Thanks to John Donners for the suggestion.
  + Clarify that memory sizes shown in lstopo are local by default
unless specified (total memory added in the root object).
* Synthetic topologies
  + Synthetic topology descriptions may now specify attributes such as
memory sizes and OS indexes. See the Synthetic topologies section
in the documentation.
  + lstopo now exports in this fully-detailed format by default.
The new option --export-synthetic-flags may be used to revert
back the old format.
* Misc
  + Add --disable-cpuid configure flag to work around buggy processor
simulators reporting invalid CPUID information.
Thanks for Andrew Friedley for reporting the issue.
  + Fix a racy use of libltdl when manipulating multiple topologies in
different threads.
Thanks to Andra Hugo for reporting the issue and testing patches.
  + The plugin ABI has changed, this release will not load plugins
built against previous hwloc releases.



Re: [OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-09-01 Thread Gilles Gouaillardet
Jeff,

i did not get any reply :-(

from the OpenSHMEM 1.1 specs :

Data object on the PE identified by pe that contains the data to be
copied. This data object must be remotely accessible.

so i assumed the test was incorrect and i commited a new one (r2418)

Cheers,

Gilles

On 2014/08/29 23:41, Jeff Squyres (jsquyres) wrote:
> Gilles --
>
> Did you get a reply about this?
>
>
> On Aug 26, 2014, at 3:17 AM, Gilles Gouaillardet 
>  wrote:
>
>> Folks,
>>
>> the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is
>> currently failing.
>>
>> i looked at the test itself, and compared it to test_shmem_zero_put.x
>> (that is a success) and
>> i am very puzzled ...
>>
>> the test calls several flavors of shmem_*_get where :
>> - the destination is in the shmem (why not, but this is useless)
>> - the source is *not* in the shmem
>> - the number of elements to be transferred is zero
>>
>> currently, this is failing because the source is *not* in the shmem.
>>
>> 1) is the test itself correct ?
>> i mean that if we compare it to test_shmem_zero_put.x, i would guess that
>> destination should be in the local memory and source should be in the shmem.
>>
>> 2) should shmem_*_get even fail ?
>> i mean there is zero data to be transferred, so why do we even care
>> whether source is in the shmem or not ?
>> is the openshmem standard explicit about this case (e.g. zero elements
>> to be transferred) ?
>>
>> 3) is a failure expected ?
>> even if i doubt it, this is an option ... and in this case, mtt should
>> be aware about it and report a success when the test fails
>>
>> 4) the test is a success on v1.8.
>> the reason is the default configure value is --oshmem-param-check=never
>> on v1.8 whereas it is --oshmem-param-check=always on trunk
>> is there any reason for this ?
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15707.php
>



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread Gilles Gouaillardet
Mishima-san,

the root cause is macro expansion does not always occur as one would
have expected ...

could you please give a try to the attached patch ?

it compiles (at least with gcc) and i made zero tests so far 

Cheers,

Gilles

On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> Hi folks,
>
> I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int)
> option
> as shown below:
>
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> --enable-abi-breaking-fortran-status-i8-fix \
> --with-tm \
> --with-verbs \
> --disable-ipv6 \
> CC=pgcc CFLAGS="-tp k8-64e -fast" \
> CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
>
> Then I saw this compile error in making oshmem at the last stage:
>
> if test ! -r pshmem_real8_swap_f.c ; then \
> pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_real8_swap_f.c ; \
> fi
>   CC   pshmem_real8_swap_f.lo
> if test ! -r pshmem_int4_cswap_f.c ; then \
> pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_int4_cswap_f.c ; \
> fi
>   CC   pshmem_int4_cswap_f.lo
> PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> make[3]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> make: *** [all-recursive] Error 1
>
> I confirmed that it worked if I added configure option of --disable-oshmem.
> So, I hope that oshmem experts would fix this problem.
>
> (additional note)
> I switched to use gnu compiler and checked with this configuration, then
> I got the same error:
>
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> --enable-abi-breaking-fortran-status-i8-fix \
> --disable-ipv6 \
> F77=gfortran \
> FC=gfortran \
> CC=gcc \
> CXX=g++ \
> FFLAGS="-m64 -fdefault-integer-8" \
> FCFLAGS="-m64 -fdefault-integer-8" \
> CFLAGS=-m64 \
> CXXFLAGS=-m64
>
> make
> 
> if test ! -r pshmem_int4_cswap_f.c ; then \
> pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_int4_cswap_f.c ; \
> fi
>   CC   pshmem_int4_cswap_f.lo
> pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
>
> Regards
> Tetsuya Mishima
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15764.php

Index: oshmem/shmem/fortran/shmem_int4_cswap_f.c
===
--- oshmem/shmem/fortran/shmem_int4_cswap_f.c   (revision 32657)
+++ oshmem/shmem/fortran/shmem_int4_cswap_f.c   (working copy)
@@ -2,6 +2,8 @@
  * Copyright (c) 2013  Mellanox Technologies, Inc.
  * All rights reserved.
  * Copyright (c) 2013 Cisco Systems, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -38,7 +40,7 @@

 MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), 
 (void *)_value, 
-(const void*)(_FINT_2_INT(*cond)), 
+(const void*)(OMPI_PFINT_2_PINT(cond)), 
 FPTR_2_VOID_PTR(value), 
 sizeof(out_value), 
 OMPI_FINT_2_INT(*pe)));
Index: oshmem/shmem/fortran/shmem_int8_cswap_f.c
===
--- oshmem/shmem/fortran/shmem_int8_cswap_f.c   (revision 32657)
+++ oshmem/shmem/fortran/shmem_int8_cswap_f.c   (working copy)
@@ -2,6 +2,8 @@
  * Copyright (c) 2013  Mellanox Technologies, Inc.
  * All rights reserved.
  * Copyright (c) 2013 Cisco Systems, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -38,7 +40,7 @@

 MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), 
 (void *)_value, 
-(const void*)(_FINT_2_INT(*cond)), 
+(const