[hwloc-devel] Create success (hwloc git dev-195-gf100263)
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-195-gf100263 Start time: Mon Sep 1 21:01:02 EDT 2014 End time: Mon Sep 1 21:02:30 EDT 2014 Your friendly daemon, Cyrador
Re: [OMPI devel] OMPI devel] race condition in coll/ml
Ralph, The changeset avoids SIGSEGV by calling mpi_abort before bad things happen. The attached patch seems to fix the problem (and makes the changeset kind of useless). Once again, the patch was very little tested and might break other parts of coll/m.laposte Cheers, Gilles Ralph Castainwrote: >Usually we have trouble with coll/ml because the process locality isn't being >reported sufficiently for its needs. Given the recent change in data exchange, >I suspect that is the root cause here - I have a note to Nathan asking for >clarification of the coll/ml locality requirement. > >Did this patch "fix" the problem by avoiding the segfault due to coll/ml >disqualifying itself? Or did it make everything work okay again? > > >On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> mtt recently failed a bunch of times with the trunk. >> a good suspect is the collective/ibarrier test from the ibm test suite. >> >> most of the time, CHECK_AND_RECYCLE will fail >> /* IS_COLL_SYNCMEM(coll_op) is true */ >> >> with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is >> called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW) >> >> i commited r32659 in order to : >> - display an error message >> - abort if the communicator is an intrincic one >> >> with attached modified version of the ibarrier test, i always get an >> error on task 0 when invoked with >> mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier >> >> the modified version adds some sleep(1) in order to work around the race >> condition and get a reproducible crash >> >> i tried to dig and could not find a correct way to fix this. >> that being said, i tried the attached ml.patch and it did fix the >> problem (even with NREQS=1024) >> i did not commit it since this is very likely incorrect. >> >> could someone have a look ? >> >> Cheers, >> >> Gilles >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15767.php > >___ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2014/09/15769.php
Re: [OMPI devel] race condition in coll/ml
Usually we have trouble with coll/ml because the process locality isn't being reported sufficiently for its needs. Given the recent change in data exchange, I suspect that is the root cause here - I have a note to Nathan asking for clarification of the coll/ml locality requirement. Did this patch "fix" the problem by avoiding the segfault due to coll/ml disqualifying itself? Or did it make everything work okay again? On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardetwrote: > Folks, > > mtt recently failed a bunch of times with the trunk. > a good suspect is the collective/ibarrier test from the ibm test suite. > > most of the time, CHECK_AND_RECYCLE will fail > /* IS_COLL_SYNCMEM(coll_op) is true */ > > with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is > called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW) > > i commited r32659 in order to : > - display an error message > - abort if the communicator is an intrincic one > > with attached modified version of the ibarrier test, i always get an > error on task 0 when invoked with > mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier > > the modified version adds some sleep(1) in order to work around the race > condition and get a reproducible crash > > i tried to dig and could not find a correct way to fix this. > that being said, i tried the attached ml.patch and it did fix the > problem (even with NREQS=1024) > i did not commit it since this is very likely incorrect. > > could someone have a look ? > > Cheers, > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15767.php
Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration
Gilles, Thank you for your fix. I successfully compiled it with PGI, although I could not check it executing actual test run. Tetsuya > Mishima-san, > > the root cause is macro expansion does not always occur as one would > have expected ... > > could you please give a try to the attached patch ? > > it compiles (at least with gcc) and i made zero tests so far > > Cheers, > > Gilles > > On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote: > > Hi folks, > > > > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int) > > option > > as shown below: > > > > ./configure \ > > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \ > > --enable-abi-breaking-fortran-status-i8-fix \ > > --with-tm \ > > --with-verbs \ > > --disable-ipv6 \ > > CC=pgcc CFLAGS="-tp k8-64e -fast" \ > > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \ > > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \ > > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast" > > > > Then I saw this compile error in making oshmem at the last stage: > > > > if test ! -r pshmem_real8_swap_f.c ; then \ > > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \ > > ln -s ../../../../oshmem/shmem/fortran/$pname > > pshmem_real8_swap_f.c ; \ > > fi > > CC pshmem_real8_swap_f.lo > > if test ! -r pshmem_int4_cswap_f.c ; then \ > > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \ > > ln -s ../../../../oshmem/shmem/fortran/$pname > > pshmem_int4_cswap_f.c ; \ > > fi > > CC pshmem_int4_cswap_f.lo > > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39) > > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors > > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1 > > make[3]: Leaving directory > > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile' > > make[2]: *** [all-recursive] Error 1 > > make[2]: Leaving directory > > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran' > > make[1]: *** [all-recursive] Error 1 > > make[1]: Leaving directory > > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem' > > make: *** [all-recursive] Error 1 > > > > I confirmed that it worked if I added configure option of --disable-oshmem. > > So, I hope that oshmem experts would fix this problem. > > > > (additional note) > > I switched to use gnu compiler and checked with this configuration, then > > I got the same error: > > > > ./configure \ > > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \ > > --enable-abi-breaking-fortran-status-i8-fix \ > > --disable-ipv6 \ > > F77=gfortran \ > > FC=gfortran \ > > CC=gcc \ > > CXX=g++ \ > > FFLAGS="-m64 -fdefault-integer-8" \ > > FCFLAGS="-m64 -fdefault-integer-8" \ > > CFLAGS=-m64 \ > > CXXFLAGS=-m64 > > > > make > > > > if test ! -r pshmem_int4_cswap_f.c ; then \ > > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \ > > ln -s ../../../../oshmem/shmem/fortran/$pname > > pshmem_int4_cswap_f.c ; \ > > fi > > CC pshmem_int4_cswap_f.lo > > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f': > > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&' > > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1 > > > > Regards > > Tetsuya Mishima > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/08/15764.php > > - oshmem.i8.patch___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives: http://www.open-mpi.org/community/lists/devel/2014/09/index.php
[OMPI devel] race condition in coll/ml
Folks, mtt recently failed a bunch of times with the trunk. a good suspect is the collective/ibarrier test from the ibm test suite. most of the time, CHECK_AND_RECYCLE will fail /* IS_COLL_SYNCMEM(coll_op) is true */ with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW) i commited r32659 in order to : - display an error message - abort if the communicator is an intrincic one with attached modified version of the ibarrier test, i always get an error on task 0 when invoked with mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier the modified version adds some sleep(1) in order to work around the race condition and get a reproducible crash i tried to dig and could not find a correct way to fix this. that being said, i tried the attached ml.patch and it did fix the problem (even with NREQS=1024) i did not commit it since this is very likely incorrect. could someone have a look ? Cheers, Gilles /* * $HEADER$ */ / MESSAGE PASSING INTERFACE TEST CASE SUITE Copyright IBM Corp. 1995 IBM Corp. hereby grants a non-exclusive license to use, copy, modify, and distribute this software for any purpose and without fee provided that the above copyright notice and the following paragraphs appear in all copies. IBM Corp. makes no representation that the test cases comprising this suite are correct or are an accurate representation of any standard. In no event shall IBM be liable to any party for direct, indirect, special incidental, or consequential damage arising out of the use of this software even if IBM Corp. has been advised of the possibility of such damage. IBM CORP. SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS AND IBM CORP. HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. These test cases reflect an interpretation of the MPI Standard. They are are, in most cases, unit tests of specific MPI behaviors. If a user of any test case from this set believes that the MPI Standard requires behavior different than that implied by the test case we would appreciate feedback. Comments may be sent to: Richard Treumann treum...@kgn.ibm.com */ #include #include #include #include #include "ompitest_error.h" #ifndef NREQS #define NREQS 16 #endif int main(int argc, char** argv) { int i, me, rank, tasks; double t1, t2; MPI_Request req[NREQS]; MPI_Comm comm; MPI_Init(,); ompitest_check_size(__FILE__, __LINE__, 2, 1); MPI_Comm_rank(MPI_COMM_WORLD, ); MPI_Comm_dup(MPI_COMM_WORLD, ); MPI_Barrier(MPI_COMM_WORLD); if (rank > 0) sleep(2); /* Do a bunch of barriers */ for (i = 0; i < NREQS; ++i) { MPI_Ibarrier(comm, [i]); } MPI_Waitall(NREQS, req, MPI_STATUSES_IGNORE); if (rank > 0) sleep(2); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; } Index: ompi/mca/coll/ml/coll_ml_inlines.h === --- ompi/mca/coll/ml/coll_ml_inlines.h (revision 32658) +++ ompi/mca/coll/ml/coll_ml_inlines.h (working copy) @@ -192,7 +192,7 @@ !out_of_resource) { */ if (((_op->full_message != coll_op->fragment_data.message_descriptor) && -!out_of_resource) || IS_COLL_SYNCMEM(coll_op)) { +!out_of_resource)) { /* non-zero offset ==> this is not fragment 0 */ CHECK_AND_RECYCLE(coll_op); }
[hwloc-devel] preparing v1.10
Hello, I am planning to release v1.10 soon. We don't have any new big feature, just random improvements everywhere, see below. If there's something important you want now, please speak up quickly. thanks Brice * API + hwloc_distrib() does not ignore any objects anymore when there are too many of them. They get merged with others instead. Thanks to Tim Creech for reporting the issue. + Add hwloc_topology_export_synthetic() to export a topology to a synthetic string without using lstopo. See the Synthetic topologies section in the documentation. + Add hwloc_topology_set/get_userdata() to let the application save a private pointer in the topology whenever it needs a way to find its own object corresponding to a topology. * Tools + hwloc-bind --get now executes the command after displaying the binding instead of ignoring the command entirely. Thanks to John Donners for the suggestion. + Clarify that memory sizes shown in lstopo are local by default unless specified (total memory added in the root object). * Synthetic topologies + Synthetic topology descriptions may now specify attributes such as memory sizes and OS indexes. See the Synthetic topologies section in the documentation. + lstopo now exports in this fully-detailed format by default. The new option --export-synthetic-flags may be used to revert back the old format. * Misc + Add --disable-cpuid configure flag to work around buggy processor simulators reporting invalid CPUID information. Thanks for Andrew Friedley for reporting the issue. + Fix a racy use of libltdl when manipulating multiple topologies in different threads. Thanks to Andra Hugo for reporting the issue and testing patches. + The plugin ABI has changed, this release will not load plugins built against previous hwloc releases.
Re: [OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite
Jeff, i did not get any reply :-( from the OpenSHMEM 1.1 specs : Data object on the PE identified by pe that contains the data to be copied. This data object must be remotely accessible. so i assumed the test was incorrect and i commited a new one (r2418) Cheers, Gilles On 2014/08/29 23:41, Jeff Squyres (jsquyres) wrote: > Gilles -- > > Did you get a reply about this? > > > On Aug 26, 2014, at 3:17 AM, Gilles Gouaillardet >wrote: > >> Folks, >> >> the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is >> currently failing. >> >> i looked at the test itself, and compared it to test_shmem_zero_put.x >> (that is a success) and >> i am very puzzled ... >> >> the test calls several flavors of shmem_*_get where : >> - the destination is in the shmem (why not, but this is useless) >> - the source is *not* in the shmem >> - the number of elements to be transferred is zero >> >> currently, this is failing because the source is *not* in the shmem. >> >> 1) is the test itself correct ? >> i mean that if we compare it to test_shmem_zero_put.x, i would guess that >> destination should be in the local memory and source should be in the shmem. >> >> 2) should shmem_*_get even fail ? >> i mean there is zero data to be transferred, so why do we even care >> whether source is in the shmem or not ? >> is the openshmem standard explicit about this case (e.g. zero elements >> to be transferred) ? >> >> 3) is a failure expected ? >> even if i doubt it, this is an option ... and in this case, mtt should >> be aware about it and report a success when the test fails >> >> 4) the test is a success on v1.8. >> the reason is the default configure value is --oshmem-param-check=never >> on v1.8 whereas it is --oshmem-param-check=always on trunk >> is there any reason for this ? >> >> Cheers, >> >> Gilles >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15707.php >
Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration
Mishima-san, the root cause is macro expansion does not always occur as one would have expected ... could you please give a try to the attached patch ? it compiles (at least with gcc) and i made zero tests so far Cheers, Gilles On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote: > Hi folks, > > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int) > option > as shown below: > > ./configure \ > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \ > --enable-abi-breaking-fortran-status-i8-fix \ > --with-tm \ > --with-verbs \ > --disable-ipv6 \ > CC=pgcc CFLAGS="-tp k8-64e -fast" \ > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \ > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \ > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast" > > Then I saw this compile error in making oshmem at the last stage: > > if test ! -r pshmem_real8_swap_f.c ; then \ > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \ > ln -s ../../../../oshmem/shmem/fortran/$pname > pshmem_real8_swap_f.c ; \ > fi > CC pshmem_real8_swap_f.lo > if test ! -r pshmem_int4_cswap_f.c ; then \ > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \ > ln -s ../../../../oshmem/shmem/fortran/$pname > pshmem_int4_cswap_f.c ; \ > fi > CC pshmem_int4_cswap_f.lo > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39) > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1 > make[3]: Leaving directory > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile' > make[2]: *** [all-recursive] Error 1 > make[2]: Leaving directory > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem' > make: *** [all-recursive] Error 1 > > I confirmed that it worked if I added configure option of --disable-oshmem. > So, I hope that oshmem experts would fix this problem. > > (additional note) > I switched to use gnu compiler and checked with this configuration, then > I got the same error: > > ./configure \ > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \ > --enable-abi-breaking-fortran-status-i8-fix \ > --disable-ipv6 \ > F77=gfortran \ > FC=gfortran \ > CC=gcc \ > CXX=g++ \ > FFLAGS="-m64 -fdefault-integer-8" \ > FCFLAGS="-m64 -fdefault-integer-8" \ > CFLAGS=-m64 \ > CXXFLAGS=-m64 > > make > > if test ! -r pshmem_int4_cswap_f.c ; then \ > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \ > ln -s ../../../../oshmem/shmem/fortran/$pname > pshmem_int4_cswap_f.c ; \ > fi > CC pshmem_int4_cswap_f.lo > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f': > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&' > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1 > > Regards > Tetsuya Mishima > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15764.php Index: oshmem/shmem/fortran/shmem_int4_cswap_f.c === --- oshmem/shmem/fortran/shmem_int4_cswap_f.c (revision 32657) +++ oshmem/shmem/fortran/shmem_int4_cswap_f.c (working copy) @@ -2,6 +2,8 @@ * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +40,7 @@ MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), (void *)_value, -(const void*)(_FINT_2_INT(*cond)), +(const void*)(OMPI_PFINT_2_PINT(cond)), FPTR_2_VOID_PTR(value), sizeof(out_value), OMPI_FINT_2_INT(*pe))); Index: oshmem/shmem/fortran/shmem_int8_cswap_f.c === --- oshmem/shmem/fortran/shmem_int8_cswap_f.c (revision 32657) +++ oshmem/shmem/fortran/shmem_int8_cswap_f.c (working copy) @@ -2,6 +2,8 @@ * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +40,7 @@ MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), (void *)_value, -(const void*)(_FINT_2_INT(*cond)), +(const