Re: [OMPI users] possible bug exercised by mpi4py
On 23 May 9:37PM, Jonathan Dursi wrote: On the other hand, it works everywhere if I pad the rcounts array with an extra valid value (0 or 1, or for that matter 783), or replace the allgatherv with an allgather. .. and it fails with 7 even where it worked (but succeeds with 8) if I pad rcounts with an extra invalid value which should never be read. Should the recvcounts[] parameters test in allgatherv.c loop up to size=ompi_comm_remote_size(comm), as is done in alltoallv.c, rather than ompi_comm_size(comm) ? That seems to avoid the problem. - Jonathan -- Jonathan Dursi | SciNet, Compute/Calcul Canada | www.SciNetHPC.ca
Re: [OMPI users] possible bug exercised by mpi4py
In case it is helpful to those who may not have the Intel compilers, these are the libraries against which the two executables of Lisandro's allgather.c get linked: with Intel compilers: = $ ldd a.out linux-vdso.so.1 => (0x7fffb7dfd000) libmpi.so.0 => /home/software/rhel5/openmpi-1.4.3/intel-11.0/lib/libmpi.so.0 (0x2b460cec2000) libopen-rte.so.0 => /home/software/rhel5/openmpi-1.4.3/intel-11.0/lib/libopen-rte.so.0 (0x2b460d198000) libopen-pal.so.0 => /home/software/rhel5/openmpi-1.4.3/intel-11.0/lib/libopen-pal.so.0 (0x2b460d40) libdl.so.2 => /lib64/libdl.so.2 (0x003f6a20) libnsl.so.1 => /lib64/libnsl.so.1 (0x003f6fe0) libutil.so.1 => /lib64/libutil.so.1 (0x003f7460) libm.so.6 => /lib64/libm.so.6 (0x003f6a60) libpthread.so.0 => /lib64/libpthread.so.0 (0x003f6ae0) libc.so.6 => /lib64/libc.so.6 (0x003f69e0) libimf.so => /usr/caen/intel-11.0/fc/11.0.074/lib/intel64/libimf.so (0x2b460d69f000) libsvml.so => /usr/caen/intel-11.0/fc/11.0.074/lib/intel64/libsvml.so (0x2b460d9f6000) libintlc.so.5 => /usr/caen/intel-11.0/fc/11.0.074/lib/intel64/libintlc.so.5 (0x2b460dbb3000) libgcc_s.so.1 => /home/software/rhel5/gcc/4.6.2/lib64/libgcc_s.so.1 (0x2b460dcf) /lib64/ld-linux-x86-64.so.2 (0x003f69a0) = with GCC 4.6.2 = $ ldd a.out linux-vdso.so.1 => (0x7fff93dfd000) libmpi.so.0 => /home/software/rhel5/openmpi-1.4.4/gcc-4.6.2/lib/libmpi.so.0 (0x2ab3ba523000) libopen-rte.so.0 => /home/software/rhel5/openmpi-1.4.4/gcc-4.6.2/lib/libopen-rte.so.0 (0x2ab3ba7cf000) libopen-pal.so.0 => /home/software/rhel5/openmpi-1.4.4/gcc-4.6.2/lib/libopen-pal.so.0 (0x2ab3baa1d000) libdl.so.2 => /lib64/libdl.so.2 (0x003f6a20) libnsl.so.1 => /lib64/libnsl.so.1 (0x003f6fe0) libutil.so.1 => /lib64/libutil.so.1 (0x003f7460) libm.so.6 => /lib64/libm.so.6 (0x003f6a60) libpthread.so.0 => /lib64/libpthread.so.0 (0x003f6ae0) libc.so.6 => /lib64/libc.so.6 (0x003f69e0) /lib64/ld-linux-x86-64.so.2 (0x003f69a0) = -- bennet -- East Hall Technical Services Mathematics and Psychology Research Computing University of Michigan (734) 763-1182
Re: [OMPI users] possible bug exercised by mpi4py
Fails for me with 1.4.3 with gcc, but works with intel; works with 1.4.4 with gcc or intel; fails with 1.5.5 with either. Succeeds with intelmpi. On the other hand, it works everywhere if I pad the rcounts array with an extra valid value (0 or 1, or for that matter 783), or replace the allgatherv with an allgather. - Jonathan -- Jonathan Dursi | SciNet, Compute/Calcul Canada | www.SciNetHPC.ca
Re: [OMPI users] possible bug exercised by mpi4py
On Wed, 23 May 2012, Lisandro Dalcin wrote: On 23 May 2012 19:04, Jeff Squyreswrote: Thanks for all the info! But still, can we get a copy of the test in C? That would make it significantly easier for us to tell if there is a problem with Open MPI -- mainly because we don't know anything about the internals of mpi4py. FYI, this test ran fine with previous (but recent, let say 1.5.4) OpenMPI versions, but fails with 1.6. The test also runs fine with MPICH2. I compiled the C example Lisandro provided using openmpi/1.4.3 compiled against the Intel 11.0 compilers, and it ran the first time. I then recompiled using gcc 4.6.2 and openmpi 1.4.4, and it provided the following errors: $ mpirun -np 5 a.out [hostname:6601] *** An error occurred in MPI_Allgatherv [hostname:6601] *** on communicator [hostname:6601] *** MPI_ERR_COUNT: invalid count argument [hostname:6601] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 4 with PID 6601 on node hostname exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- I then recompiled using the Intel compilers, and it runs without error 10 out of 10 times. I then recompiled using the gcc 4.6.2/openmpi 1.4.4 combination, and it fails consistently. On the second and subsequent tries, it provides the following additional errors: $ mpirun -np 5 a.out [hostname:7168] *** An error occurred in MPI_Allgatherv [hostname:7168] *** on communicator [hostname:7168] *** MPI_ERR_COUNT: invalid count argument [hostname:7168] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 2 with PID 7168 on node hostname exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [hostname:07163] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [hostname:07163] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages Not sure if that information is helpful or not. I am still completely puzzled why the number 5 is magic -- bennet
Re: [OMPI users] openmpi-1.6 undefined reference
On May 23, 2012, at 6:20 PM, marco atzeri wrote: > ~ 90% of the time we have mismatch problems between upstream and > cygwin on autoconf/automake/libtool versions that are not cygwin > aware or updated. Ok, fair enough. I'd be curious if you actually need to do this with Open MPI -- we use very recent versions of the GNU Autotools to bootstrap our tarballs. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] openmpi-1.6 undefined reference
On 5/23/2012 11:20 PM, Jeff Squyres wrote: On May 23, 2012, at 9:53 AM, marco atzeri wrote: experience says that autoreconf is a good approach on cygwin, it is almost standard on our package build procedure. I'm still curious: why? (I'm *assuming* that you're building from an official Open MPI tarball -- is that incorrect?) I ask because we've already run autoreconf, meaning that official Open MPI tarballs are fully bootstrapped and do not need to have autogen (i.e., ultimately autoreconf) re-run on them. Specifically: I'm unaware of a reason why you should need to re-run autogen (autoreconf) on an otherwise-unaltered Open MPI that was freshly extracted from a tarball. Does something happen differently if you *don't* re-run autogen (autoreconf)? Re-running autogen shouldn't be causing you any problems, of course -- this is just my curiosity asserting itself... Hi Jeff, ~ 90% of the time we have mismatch problems between upstream and cygwin on autoconf/automake/libtool versions that are not cygwin aware or updated. As safe approuch, we prefer apply "autoreconf -i -f" as default when building binary packages. see cygautoreconf on http://cygwin-ports.svn.sourceforge.net/viewvc/cygwin-ports/cygport/trunk/README Regards Marco
Re: [OMPI users] possible bug exercised by mpi4py
Thanks for all the info! But still, can we get a copy of the test in C? That would make it significantly easier for us to tell if there is a problem with Open MPI -- mainly because we don't know anything about the internals of mpi4py. On May 23, 2012, at 5:43 PM, Bennet Fauber wrote: > Thanks, Ralph, > > On Wed, 23 May 2012, Ralph Castain wrote: > >> I don't honestly think many of us have any knowledge of mpi4py. Does this >> test work with other MPIs? > > The mpi4py developers have said they've never seen this using mpich2. I have > not been able to test that myself. > >> MPI_Allgather seems to be passing our tests, so I suspect it is something in >> the binding. If you can provide the actual test, I'm willing to take a look >> at it. > > The actual test is included in the install bundle for mpi4py, along with the > C source code used to create the bindings. > > http://code.google.com/p/mpi4py/downloads/list > > The install is straightforward and simple. Unpack the tarball, make sure > that mpicc is in your path > > $ cd mpi4py-1.3 > $ python setup.py build > $ python setup.py install --prefix=/your/install > $ export PYTHONPATH=/your/install/lib/pythonN.M/site-packages > $ mpirun -np 5 python test/runtests.py \ >--verbose --no-threads --include cco_obj_inter > > where N.M are the major.minor numbers of your python distribution. > > What I find most puzzling is that, maybe, 1 out of 10 times it will run to > completion with -np 5, and it runs with all other numbers of processors I've > tested always. > > -- bennet > >> On May 23, 2012, at 2:52 PM, Bennet Fauber wrote: >> >>> I've installed the latest mpi4py-1.3 on several systems, and there is a >>> repeated bug when running >>> >>> $ mpirun -np 5 python test/runtests.py >>> >>> where it throws an error on mpigather with openmpi-1.4.4 and hangs with >>> openmpi-1.3. >>> >>> It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, >>> 7, 8, 9, 10, 11, and 12. >>> >>> There is a thread on this at >>> >>> http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 >>> >>> where others report being able to replicate, too. >>> >>> The compiler used first was gcc-4.6.2, with openmpi-1.4.4. >>> >>> These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and >>> versions of openmpi 1.3.0 and 1.4.4. >>> >>> Lisandro who is the primary developer of mpi4py is able to replicate on >>> Fedora 16. >>> >>> Someone else is able to reproduce with >>> >>> [ quoting from the groups.google.com page... ] >>> === >>> It also happens with the current hg version of mpi4py and >>> $ rpm -qa openmpi gcc python >>> python-2.7.3-6.fc17.x86_64 >>> gcc-4.7.0-5.fc17.x86_64 >>> openmpi-1.5.4-5.fc17.1.x86_64 >>> === >>> >>> So, I believe this is a bug to be reported. Per the advice at >>> >>> http://www.open-mpi.org/community/help/bugs.php >>> >>> If you feel that you do have a definite bug to report but are >>> unsure which list to post to, then post to the user's list. >>> >>> Please let me know if there is additional information that you need to >>> replicate. >>> >>> Some output is included below the signature in case it is useful. >>> >>> -- bennet >>> -- >>> East Hall Technical Services >>> Mathematics and Psychology Research Computing >>> University of Michigan >>> (734) 763-1182 >>> >>> On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 >>> >>> $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads >>> --include cco_obj_inter >>> [0...@sirocco.math.lsa.umich.edu] Python 2.7 >>> (/home/bennet/epd7.2.2/bin/python) >>> [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) >>> [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 >>> (build/lib.linux-x86_64-2.7/mpi4py) >>> [1...@sirocco.math.lsa.umich.edu] Python 2.7 >>> (/home/bennet/epd7.2.2/bin/python) >>> [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) >>> [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 >>> (build/lib.linux-x86_64-2.7/mpi4py) >>> [2...@sirocco.math.lsa.umich.edu] Python 2.7 >>> (/home/bennet/epd7.2.2/bin/python) >>> [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) >>> [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 >>> (build/lib.linux-x86_64-2.7/mpi4py) >>> [3...@sirocco.math.lsa.umich.edu] Python 2.7 >>> (/home/bennet/epd7.2.2/bin/python) >>> [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) >>> [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 >>> (build/lib.linux-x86_64-2.7/mpi4py) >>> [4...@sirocco.math.lsa.umich.edu] Python 2.7 >>> (/home/bennet/epd7.2.2/bin/python) >>> [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) >>> [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 >>> (build/lib.linux-x86_64-2.7/mpi4py) >>> testAllgather
Re: [OMPI users] possible bug exercised by mpi4py
Jeff, Well, not really, since the test is written in python ;-) The mpi4py source code is at http://code.google.com/p/mpi4py/downloads/list but I'm not sure what else I can provide, though. I'm more the reporting middleman here. I'd be happy to try to connect you and the developer of mpi4py. It seems like openmpi should work regardless what value -np is used, which is what puzzles both me and the mpi4py developers. -- bennet On Wed, 23 May 2012, Jeff Squyres wrote: Can you provide us with a C version of the test? On May 23, 2012, at 4:52 PM, Bennet Fauber wrote: I've installed the latest mpi4py-1.3 on several systems, and there is a repeated bug when running $ mpirun -np 5 python test/runtests.py where it throws an error on mpigather with openmpi-1.4.4 and hangs with openmpi-1.3. It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, 7, 8, 9, 10, 11, and 12. There is a thread on this at http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 where others report being able to replicate, too. The compiler used first was gcc-4.6.2, with openmpi-1.4.4. These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and versions of openmpi 1.3.0 and 1.4.4. Lisandro who is the primary developer of mpi4py is able to replicate on Fedora 16. Someone else is able to reproduce with [ quoting from the groups.google.com page... ] === It also happens with the current hg version of mpi4py and $ rpm -qa openmpi gcc python python-2.7.3-6.fc17.x86_64 gcc-4.7.0-5.fc17.x86_64 openmpi-1.5.4-5.fc17.1.x86_64 === So, I believe this is a bug to be reported. Per the advice at http://www.open-mpi.org/community/help/bugs.php If you feel that you do have a definite bug to report but are unsure which list to post to, then post to the user's list. Please let me know if there is additional information that you need to replicate. Some output is included below the signature in case it is useful. -- bennet -- East Hall Technical Services Mathematics and Psychology Research Computing University of Michigan (734) 763-1182 On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads --include cco_obj_inter [0...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [1...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [2...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [3...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [4...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... [ hangs ] RHEL5 === $ python Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/software/rhel6/gcc/4.7.0/libexec/gcc/x86_64- unknown-linux-gnu/4.7.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.7.0/configure --prefix=/home/software/rhel6/ gcc/4.7.0 --with-mpfr=/home/software/rhel6/gcc/mpfr-3.1.0/ --with-mpc=/ home/software/rhel6/gcc/mpc-0.9/ --with-gmp=/home/software/rhel6/gcc/ gmp-5.0.5/ --disable-multilib Thread model: posix gcc version 4.7.0 (GCC) $ mpirun -np 5 python test/runtests.py --verbose --no-threads --include cco_obj_inter [4...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [4...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [4...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) [2...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [2...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [2...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py)
Re: [OMPI users] possible bug exercised by mpi4py
Thanks, Ralph, On Wed, 23 May 2012, Ralph Castain wrote: I don't honestly think many of us have any knowledge of mpi4py. Does this test work with other MPIs? The mpi4py developers have said they've never seen this using mpich2. I have not been able to test that myself. MPI_Allgather seems to be passing our tests, so I suspect it is something in the binding. If you can provide the actual test, I'm willing to take a look at it. The actual test is included in the install bundle for mpi4py, along with the C source code used to create the bindings. http://code.google.com/p/mpi4py/downloads/list The install is straightforward and simple. Unpack the tarball, make sure that mpicc is in your path $ cd mpi4py-1.3 $ python setup.py build $ python setup.py install --prefix=/your/install $ export PYTHONPATH=/your/install/lib/pythonN.M/site-packages $ mpirun -np 5 python test/runtests.py \ --verbose --no-threads --include cco_obj_inter where N.M are the major.minor numbers of your python distribution. What I find most puzzling is that, maybe, 1 out of 10 times it will run to completion with -np 5, and it runs with all other numbers of processors I've tested always. -- bennet On May 23, 2012, at 2:52 PM, Bennet Fauber wrote: I've installed the latest mpi4py-1.3 on several systems, and there is a repeated bug when running $ mpirun -np 5 python test/runtests.py where it throws an error on mpigather with openmpi-1.4.4 and hangs with openmpi-1.3. It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, 7, 8, 9, 10, 11, and 12. There is a thread on this at http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 where others report being able to replicate, too. The compiler used first was gcc-4.6.2, with openmpi-1.4.4. These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and versions of openmpi 1.3.0 and 1.4.4. Lisandro who is the primary developer of mpi4py is able to replicate on Fedora 16. Someone else is able to reproduce with [ quoting from the groups.google.com page... ] === It also happens with the current hg version of mpi4py and $ rpm -qa openmpi gcc python python-2.7.3-6.fc17.x86_64 gcc-4.7.0-5.fc17.x86_64 openmpi-1.5.4-5.fc17.1.x86_64 === So, I believe this is a bug to be reported. Per the advice at http://www.open-mpi.org/community/help/bugs.php If you feel that you do have a definite bug to report but are unsure which list to post to, then post to the user's list. Please let me know if there is additional information that you need to replicate. Some output is included below the signature in case it is useful. -- bennet -- East Hall Technical Services Mathematics and Psychology Research Computing University of Michigan (734) 763-1182 On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads --include cco_obj_inter [0...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [1...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [2...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [3...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [4...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... [ hangs ] RHEL5 === $ python Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/software/rhel6/gcc/4.7.0/libexec/gcc/x86_64- unknown-linux-gnu/4.7.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.7.0/configure
Re: [OMPI users] possible bug exercised by mpi4py
Can you provide us with a C version of the test? On May 23, 2012, at 4:52 PM, Bennet Fauber wrote: > I've installed the latest mpi4py-1.3 on several systems, and there is a > repeated bug when running > > $ mpirun -np 5 python test/runtests.py > > where it throws an error on mpigather with openmpi-1.4.4 and hangs with > openmpi-1.3. > > It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, > 7, 8, 9, 10, 11, and 12. > > There is a thread on this at > > http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 > > where others report being able to replicate, too. > > The compiler used first was gcc-4.6.2, with openmpi-1.4.4. > > These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and > versions of openmpi 1.3.0 and 1.4.4. > > Lisandro who is the primary developer of mpi4py is able to replicate on > Fedora 16. > > Someone else is able to reproduce with > > [ quoting from the groups.google.com page... ] > === > It also happens with the current hg version of mpi4py and > $ rpm -qa openmpi gcc python > python-2.7.3-6.fc17.x86_64 > gcc-4.7.0-5.fc17.x86_64 > openmpi-1.5.4-5.fc17.1.x86_64 > === > > So, I believe this is a bug to be reported. Per the advice at > > http://www.open-mpi.org/community/help/bugs.php > > If you feel that you do have a definite bug to report but are > unsure which list to post to, then post to the user's list. > > Please let me know if there is additional information that you need to > replicate. > > Some output is included below the signature in case it is useful. > > -- bennet > -- > East Hall Technical Services > Mathematics and Psychology Research Computing > University of Michigan > (734) 763-1182 > > On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 > > $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads > --include cco_obj_inter > [0...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [1...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [2...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [3...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [4...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... > [ hangs ] > > RHEL5 > === > $ python > Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14) > [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 > > $ gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/home/software/rhel6/gcc/4.7.0/libexec/gcc/x86_64- > unknown-linux-gnu/4.7.0/lto-wrapper > Target: x86_64-unknown-linux-gnu > Configured with: ../gcc-4.7.0/configure --prefix=/home/software/rhel6/ > gcc/4.7.0 --with-mpfr=/home/software/rhel6/gcc/mpfr-3.1.0/ --with-mpc=/ > home/software/rhel6/gcc/mpc-0.9/ --with-gmp=/home/software/rhel6/gcc/ > gmp-5.0.5/ --disable-multilib > Thread model: posix > gcc version 4.7.0 (GCC) > > $ mpirun -np 5 python test/runtests.py --verbose --no-threads --include > cco_obj_inter > [4...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [4...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) > [4...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) > [2...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [2...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) > [2...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) > [1...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [1...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) > [1...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) > [0...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [0...@host-rh6.engin.umich.edu]
Re: [OMPI users] openmpi-1.6 undefined reference
On May 23, 2012, at 9:53 AM, marco atzeri wrote: > experience says that autoreconf is a good approach on cygwin, > it is almost standard on our package build procedure. I'm still curious: why? (I'm *assuming* that you're building from an official Open MPI tarball -- is that incorrect?) I ask because we've already run autoreconf, meaning that official Open MPI tarballs are fully bootstrapped and do not need to have autogen (i.e., ultimately autoreconf) re-run on them. Specifically: I'm unaware of a reason why you should need to re-run autogen (autoreconf) on an otherwise-unaltered Open MPI that was freshly extracted from a tarball. Does something happen differently if you *don't* re-run autogen (autoreconf)? Re-running autogen shouldn't be causing you any problems, of course -- this is just my curiosity asserting itself... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] possible bug exercised by mpi4py
I don't honestly think many of us have any knowledge of mpi4py. Does this test work with other MPIs? MPI_Allgather seems to be passing our tests, so I suspect it is something in the binding. If you can provide the actual test, I'm willing to take a look at it. On May 23, 2012, at 2:52 PM, Bennet Fauber wrote: > I've installed the latest mpi4py-1.3 on several systems, and there is a > repeated bug when running > > $ mpirun -np 5 python test/runtests.py > > where it throws an error on mpigather with openmpi-1.4.4 and hangs with > openmpi-1.3. > > It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, > 7, 8, 9, 10, 11, and 12. > > There is a thread on this at > > http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 > > where others report being able to replicate, too. > > The compiler used first was gcc-4.6.2, with openmpi-1.4.4. > > These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and > versions of openmpi 1.3.0 and 1.4.4. > > Lisandro who is the primary developer of mpi4py is able to replicate on > Fedora 16. > > Someone else is able to reproduce with > > [ quoting from the groups.google.com page... ] > === > It also happens with the current hg version of mpi4py and > $ rpm -qa openmpi gcc python > python-2.7.3-6.fc17.x86_64 > gcc-4.7.0-5.fc17.x86_64 > openmpi-1.5.4-5.fc17.1.x86_64 > === > > So, I believe this is a bug to be reported. Per the advice at > > http://www.open-mpi.org/community/help/bugs.php > > If you feel that you do have a definite bug to report but are > unsure which list to post to, then post to the user's list. > > Please let me know if there is additional information that you need to > replicate. > > Some output is included below the signature in case it is useful. > > -- bennet > -- > East Hall Technical Services > Mathematics and Psychology Research Computing > University of Michigan > (734) 763-1182 > > On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 > > $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads > --include cco_obj_inter > [0...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [1...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [2...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [3...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > [4...@sirocco.math.lsa.umich.edu] Python 2.7 > (/home/bennet/epd7.2.2/bin/python) > [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) > [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 > (build/lib.linux-x86_64-2.7/mpi4py) > testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... testAllgather > (test_cco_obj_inter.TestCCOObjInter) ... > [ hangs ] > > RHEL5 > === > $ python > Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14) > [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 > > $ gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/home/software/rhel6/gcc/4.7.0/libexec/gcc/x86_64- > unknown-linux-gnu/4.7.0/lto-wrapper > Target: x86_64-unknown-linux-gnu > Configured with: ../gcc-4.7.0/configure --prefix=/home/software/rhel6/ > gcc/4.7.0 --with-mpfr=/home/software/rhel6/gcc/mpfr-3.1.0/ --with-mpc=/ > home/software/rhel6/gcc/mpc-0.9/ --with-gmp=/home/software/rhel6/gcc/ > gmp-5.0.5/ --disable-multilib > Thread model: posix > gcc version 4.7.0 (GCC) > > $ mpirun -np 5 python test/runtests.py --verbose --no-threads --include > cco_obj_inter > [4...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [4...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) > [4...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) > [2...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) > [2...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) > [2...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) > [1...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) >
[OMPI users] possible bug exercised by mpi4py
I've installed the latest mpi4py-1.3 on several systems, and there is a repeated bug when running $ mpirun -np 5 python test/runtests.py where it throws an error on mpigather with openmpi-1.4.4 and hangs with openmpi-1.3. It runs to completion and passes all tests when run with -np of 2, 3, 4, 6, 7, 8, 9, 10, 11, and 12. There is a thread on this at http://groups.google.com/group/mpi4py/browse_thread/thread/509ac46af6f79973 where others report being able to replicate, too. The compiler used first was gcc-4.6.2, with openmpi-1.4.4. These are all Red Hat machines, RHEL 5 or 6 and with multiple compilers and versions of openmpi 1.3.0 and 1.4.4. Lisandro who is the primary developer of mpi4py is able to replicate on Fedora 16. Someone else is able to reproduce with [ quoting from the groups.google.com page... ] === It also happens with the current hg version of mpi4py and $ rpm -qa openmpi gcc python python-2.7.3-6.fc17.x86_64 gcc-4.7.0-5.fc17.x86_64 openmpi-1.5.4-5.fc17.1.x86_64 === So, I believe this is a bug to be reported. Per the advice at http://www.open-mpi.org/community/help/bugs.php If you feel that you do have a definite bug to report but are unsure which list to post to, then post to the user's list. Please let me know if there is additional information that you need to replicate. Some output is included below the signature in case it is useful. -- bennet -- East Hall Technical Services Mathematics and Psychology Research Computing University of Michigan (734) 763-1182 On RHEL 5, openmpi 1.3, gcc 4.1.2, python 2.7 $ mpirun -np 5 --mca btl ^sm python test/runtests.py --verbose --no-threads --include cco_obj_inter [0...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [0...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [0...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [1...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [1...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [1...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [2...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [2...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [2...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [3...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [3...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [3...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) [4...@sirocco.math.lsa.umich.edu] Python 2.7 (/home/bennet/epd7.2.2/bin/python) [4...@sirocco.math.lsa.umich.edu] MPI 2.0 (Open MPI 1.3.0) [4...@sirocco.math.lsa.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.7/mpi4py) testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... testAllgather (test_cco_obj_inter.TestCCOObjInter) ... [ hangs ] RHEL5 === $ python Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/software/rhel6/gcc/4.7.0/libexec/gcc/x86_64- unknown-linux-gnu/4.7.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.7.0/configure --prefix=/home/software/rhel6/ gcc/4.7.0 --with-mpfr=/home/software/rhel6/gcc/mpfr-3.1.0/ --with-mpc=/ home/software/rhel6/gcc/mpc-0.9/ --with-gmp=/home/software/rhel6/gcc/ gmp-5.0.5/ --disable-multilib Thread model: posix gcc version 4.7.0 (GCC) $ mpirun -np 5 python test/runtests.py --verbose --no-threads --include cco_obj_inter [4...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [4...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [4...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) [2...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [2...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [2...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) [1...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [1...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [1...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) [0...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [0...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [0...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) [3...@host-rh6.engin.umich.edu] Python 2.6 (/usr/bin/python) [3...@host-rh6.engin.umich.edu] MPI 2.1 (Open MPI 1.6.0) [3...@host-rh6.engin.umich.edu] mpi4py 1.3 (build/lib.linux-x86_64-2.6/mpi4py) testAllgather
Re: [OMPI users] openmpi-1.6 undefined reference
On 5/23/2012 2:08 PM, Ralph Castain wrote: Add "libompitrace" to your enable-contrib-no-build list. There is likely a missing include in there, but you don't need that lib to run. We'll take a look at it. thanks. build was fine and check passed almost all from "grep -i pass openmpi-1.6-1-check.log" PASS: predefined_gap_test.exe File opened with dladvise_local, all passed PASS: dlopen_test.exe All 2 tests passed - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_barrier.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_barrier_noinline.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_spinlock.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_spinlock_noinline.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_math.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_math_noinline.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_cmpset.exe - 1 threads: Passed - 2 threads: Passed - 4 threads: Passed - 5 threads: Passed - 8 threads: Passed PASS: atomic_cmpset_noinline.exe All 8 tests passed All 0 tests passed All 0 tests passed /pub/devel/openmpi/openmpi-1.6-1/src/openmpi-1.6/test/datatype/opal_datatype_test.c:533:5: warning: passing argument 1 of ‘test_create_blacs_type1’ discards qualifiers from pointer target type /pub/devel/openmpi/openmpi-1.6-1/src/openmpi-1.6/test/datatype/opal_datatype_test.c:534:5: warning: passing argument 1 of ‘test_create_blacs_type2’ discards qualifiers from pointer target type /pub/devel/openmpi/openmpi-1.6-1/src/openmpi-1.6/test/datatype/opal_ddt_lib.c:155:5: warning: passing argument 4 of ‘opal_datatype_create_struct’ from incompatible pointer type decode [PASSED] PASS: opal_datatype_test.exe PASS: checksum.exe PASS: position.exe decode [PASSED] PASS: ddt_test.exe decode [NOT PASSED] <<< ? PASS: ddt_raw.exe All 5 tests passed SUPPORT: OMPI Test Passed: opal_path_nfs(): (0 tests) PASS: opal_path_nfs.exe 1 test passed the detailed log is not more clear, I do not understand if the test is passed or not # * TEST INVERSED VECTOR # raw extraction in 0 microsec # * TEST STRANGE DATATYPE # Strange datatype BEFORE COMMIT Strange datatype AFTER COMMIT raw extraction in 0 microsec # * TEST UPPER TRIANGULAR MATRIX (size 100) # raw extraction in 0 microsec Example 3.26 type1 correct Example 3.26 type1 correct Example 3.26 type2 correct type3 correct hindexed ok indexed ok hvector ok vector ok # * TEST UPPER MATRIX # test upper matrix complete raw in 0 microsec decode [NOT PASSED] # * TEST MATRIX BORDERS # # * TEST CONTIGUOUS # test contiguous (alignement) # * TEST STRUCT # test struct >><< >><< >><< >><< Contiguous data-type (MPI_DOUBLE) raw extraction in 0 microsec >><< >><< Contiguous multiple data-type (4500*1) raw extraction in 0 microsec Contiguous multiple data-type (450*10) raw extraction in 0 microsec Contiguous multiple data-type (45*100) raw extraction in 0 microsec Contiguous multiple data-type (100*45) raw extraction in 0 microsec Contiguous multiple data-type (10*450) raw extraction in 0 microsec Contiguous multiple data-type (1*4500) raw extraction in 0 microsec >><< >><< Vector data-type (450 times 10 double stride 11) raw extraction in 0 microsec >><< >><< Vector data-type (450 times 10 double stride 11) raw extraction in 0 microsec >><< >><< raw extraction in 1000 microsec >><< >><< raw extraction in 0 microsec >><< >><< raw extraction in 2000 microsec >><< >><< raw extraction in 0 microsec
Re: [OMPI users] Shared Memory - Eager VS Rendezvous
On 05/23/2012 03:05 PM, Jeff Squyres wrote: On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote: If process A sends a message to process B and the eager protocol is used then I assume that the message is written into a shared memory area and picked up by the receiver when the receive operation is posted. Open MPI has a few different shared memory protocols. For short messages, they always follow what you mention above: CICO. For large messages, we either use a pipelined CICO (as you surmised below) or use direct memory mapping if you have the Linux knem kernel module installed. More below. When the rendezvous is utilized however the message still need to end up in the shared memory area somehow. I don't think any RDMA-like transfer exists for shared memory communications. Just to clarify: RDMA = Remote Direct Memory Access, and the "remote" usually refers to a different physical address space (e.g., a different server). In Open MPI's case, knem can use a direct memory copy between two processes. Therefore you need to buffer this message somehow, however I assume that you don't buffer the whole thing but use some type of pipelined protocol so that you reduce the size of the buffer you need to keep in the shared memory. Correct. For large messages, when using CICO, we copy the first fragment and the necessary meta data to the shmem block. When the receiver ACKs the first fragment, we pipeline CICO the rest of the large message through the shmem block. With the sender and receiver (more or less) simultaneously writing and reading to the circular shmem block, we probably won't fill it up -- meaning that the sender hypothetically won't need to block. I'm skipping a bunch of details, but that's the general idea. Is it completely wrong? It would be nice if someone could point me somewhere I can find more details about this. In the OpenMPI tuning page there are several details regarding the protocol utilized for IB but very little for SM. Good point. I'll see if we can get some more info up there. I think I found the answer to my question on Jeff Squyres blog: http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/ However now I have a new question, how do I know if my machine uses the copyin/copyout mechanism or the direct mapping? You need the Linux knem module. See the OMPI README and do a text search for "knem". Thanks a lot for the clarification. however I still have hard time to explain the following phenomena. I have a very simple code performing a ping/pong between 2 processes which are allocated on the same computing node. Each process is bound to a different CPU via affinity settings. I perform this operation with 3 cache scenarios 1) Cache is completely invalidate before the send/recv (both at the sender and receiver side) 2) Cache is preloaded before the send/recv operation and it's in "exclusive" state. 3) Cache is preloaded before the send/recv operation but this time cache lines are in a "modified" state Now scenario 2 has a speedup over scenario 1 as expected. However scenario 3 is much slower then 1. I observed this for both knem and xpmem. I assume someone is forcing the modified cache lines to be written into the memory before the copy is performed. Probably because the segment is assigned to a volatile pointer so somehow the stuff in cache has to be written into main memory. Instead when the OpenMPI CICO protocol is used 2 and 3 have the exact same speedup over 1. Therefore I assume that in this way no-one forces the write-through of dirty cache lines. I am questioning my self on this issue since yesterday and it's quite difficult to understand without knowing all the internal details. Is this an expected behaviour also for you or you find it surprising? :) cheers, Simone
Re: [OMPI users] Shared Memory - Eager VS Rendezvous
On May 23, 2012, at 7:05 AM, Jeff Squyres wrote: > On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote: > >>> If process A sends a message to process B and the eager protocol is used >>> then I assume that the message is written into a shared memory area and >>> picked up by the receiver when the receive operation is posted. > > Open MPI has a few different shared memory protocols. > > For short messages, they always follow what you mention above: CICO. > > For large messages, we either use a pipelined CICO (as you surmised below) or > use direct memory mapping if you have the Linux knem kernel module installed. > More below. > >>> When the rendezvous is utilized however the message still need to end up in >>> the shared memory area somehow. I don't think any RDMA-like transfer exists >>> for shared memory communications. > > Just to clarify: RDMA = Remote Direct Memory Access, and the "remote" usually > refers to a different physical address space (e.g., a different server). > > In Open MPI's case, knem can use a direct memory copy between two processes. In addition, the vader BTL (XPMEM BTL) also provides similar functionality - provided the XPMEM kernel module and user library are available on the system. Based on my limited experience with the two, I've noticed that knem is well-suited for Intel architectures, while XPMEM is best for AMD architectures. Samuel K. Gutierrez Los Alamos National Laboratory > >>> Therefore you need to buffer this message somehow, however I assume >>> that you don't buffer the whole thing but use some type of pipelined >>> protocol so that you reduce the size of the buffer you need to keep in the >>> shared memory. > > Correct. For large messages, when using CICO, we copy the first fragment and > the necessary meta data to the shmem block. When the receiver ACKs the first > fragment, we pipeline CICO the rest of the large message through the shmem > block. With the sender and receiver (more or less) simultaneously writing > and reading to the circular shmem block, we probably won't fill it up -- > meaning that the sender hypothetically won't need to block. > > I'm skipping a bunch of details, but that's the general idea. > >>> Is it completely wrong? It would be nice if someone could point me >>> somewhere I can find more details about this. In the OpenMPI tuning page >>> there are several details regarding the protocol utilized for IB but very >>> little for SM. > > Good point. I'll see if we can get some more info up there. > >> I think I found the answer to my question on Jeff Squyres blog: >> http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/ >> >> However now I have a new question, how do I know if my machine uses the >> copyin/copyout mechanism or the direct mapping? > > You need the Linux knem module. See the OMPI README and do a text search for > "knem". > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi-1.6 undefined reference
On 5/23/2012 3:19 PM, Jeff Squyres (jsquyres) wrote: Just curious - are you running autogen for any particular reason? I don't know how much Cygwin testing we've done. Sent from my phone. No type good. experience says that autoreconf is a good approach on cygwin, it is almost standard on our package build procedure. As autogen is performing the same action I see no reason to bypass it and run a standard autoreconf. Regards Marco
Re: [OMPI users] [EXTERNAL] Re: mpicc link shouldn't add -ldl and -lhwloc
On 5/22/12 10:36 PM, "Orion Poplawski"wrote: >On 05/22/2012 10:34 PM, Orion Poplawski wrote: >> On 05/21/2012 06:15 PM, Jeff Squyres wrote: >>> On May 15, 2012, at 10:37 AM, Orion Poplawski wrote: >>> $ mpicc -showme:link -pthread -m64 -L/usr/lib64/openmpi/lib -lmpi -ldl -lhwloc -ldl and -lhwloc should not be listed. The user should only link against libraries that they are using directly, namely -lmpi, and they should explicitly add -ldl and -lhwloc if their code directly uses those libraries. There does not appear to be any references to libdl or libhwloc symbols in the openmpi headers which is the other place it could come in. >>> >>> I just read this a few times, and I admit that I'm a little confused. >>> >>> libmpi does use symbols from libdl; we use it to load OMPI's plugins. >>> So I'm not sure why you're saying we shouldn't -ldl in the wrapper >>> compiler...? >>> >>> libhwloc might be a little questionable here -- I'll have to check to >>> see whether 1.6 uses hwloc only in a plugin or whether it's used in >>> the base library (I don't remember offhand). >>> >> >> But libmpi is already linked against libdl and libhwloc. The wrapper >> libraries are added when linking user code. But unless a user's code >> directly uses libdl or libhwloc they don't need to link to those >>libraries. >> > >I should add the caveat that they are need when linking statically, but >not when using shared libraries. And therein lies the problem. We have a number of users who build Open MPI statically and even some who build both static and shared libraries in the same build. We've never been able to figure out a reasonable way to guess if we need to add -lhwloc or -ldl, so we add them. It's better to list them and have some redundant dependencies (since you have to have the library anyways) than to not list them and have odd link errors. We're open to better suggestions, but keep in mind that they have to be portable and not require the user to change their build systems (ie, we can't just depend on libtool to do everything for us). Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI users] openmpi-1.6 undefined reference
Just curious - are you running autogen for any particular reason? I don't know how much Cygwin testing we've done. Sent from my phone. No type good. On May 23, 2012, at 6:09 AM, "Ralph Castain"wrote: > Add "libompitrace" to your enable-contrib-no-build list. There is likely a > missing include in there, but you don't need that lib to run. We'll take a > look at it. > > > On May 23, 2012, at 12:58 AM, marco atzeri wrote: > >> I am trying to build openmpi-1.6 for cygwin with dynamic libs >> >> - >> ./autogen.sh >> cd build_dir >> source_dir/configure \ >> LDFLAGS="-Wl,--export-all-symbols -no-undefined" \ >> --disable-mca-dso \ >> --without-udapl \ >> --enable-cxx-exceptions \ >> --enable-mpi-threads \ >> --enable-progress-threads \ >> --with-threads=posix \ >> --without-cs-fs \ >> --enable-heterogeneous \ >> --with-mpi-param_check=always \ >> --enable-contrib-no-build=vt \ >> --enable-mca-nobuild=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv >> make >> - >> >> the build stop here : >> CCLD libompitrace.la >> Creating library file: .libs/libompitrace.dll.a.libs/abort.o: In function >> `MPI_Abort': >> /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: >> undefined reference to `_o mpi_mpi_comm_world' >> /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: >> undefined reference to `_P MPI_Comm_rank' >> /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:33: >> undefined reference to `_P MPI_Comm_get_name' >> /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:38: >> undefined reference to `_P MPI_Abort' >> >> I do not find "mpi_mpi_comm_world" defined in any of the >> already built objects, except >> >> ./ompi/communicator/.libs/comm_init.o >> 0200 C _ompi_mpi_comm_world >> >> and on libmpi.dll.a >> >> d002278.o: >> i .idata$4 >> i .idata$5 >> i .idata$6 >> i .idata$7 >> t .text >>U __head_cygmpi_1_dll >> I __imp__ompi_mpi_comm_world >> I __nm__ompi_mpi_comm_world >> >> >> Hint ? >> >> Marco >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Shared Memory - Eager VS Rendezvous
On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote: >> If process A sends a message to process B and the eager protocol is used >> then I assume that the message is written into a shared memory area and >> picked up by the receiver when the receive operation is posted. Open MPI has a few different shared memory protocols. For short messages, they always follow what you mention above: CICO. For large messages, we either use a pipelined CICO (as you surmised below) or use direct memory mapping if you have the Linux knem kernel module installed. More below. >> When the rendezvous is utilized however the message still need to end up in >> the shared memory area somehow. I don't think any RDMA-like transfer exists >> for shared memory communications. Just to clarify: RDMA = Remote Direct Memory Access, and the "remote" usually refers to a different physical address space (e.g., a different server). In Open MPI's case, knem can use a direct memory copy between two processes. >> Therefore you need to buffer this message somehow, however I assume >> that you don't buffer the whole thing but use some type of pipelined >> protocol so that you reduce the size of the buffer you need to keep in the >> shared memory. Correct. For large messages, when using CICO, we copy the first fragment and the necessary meta data to the shmem block. When the receiver ACKs the first fragment, we pipeline CICO the rest of the large message through the shmem block. With the sender and receiver (more or less) simultaneously writing and reading to the circular shmem block, we probably won't fill it up -- meaning that the sender hypothetically won't need to block. I'm skipping a bunch of details, but that's the general idea. >> Is it completely wrong? It would be nice if someone could point me somewhere >> I can find more details about this. In the OpenMPI tuning page there are >> several details regarding the protocol utilized for IB but very little for >> SM. Good point. I'll see if we can get some more info up there. > I think I found the answer to my question on Jeff Squyres blog: > http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/ > > However now I have a new question, how do I know if my machine uses the > copyin/copyout mechanism or the direct mapping? You need the Linux knem module. See the OMPI README and do a text search for "knem". -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] openmpi-1.6 + Intel compilers
Hello, I have built openmpi-1.6 for MacOSX Lion 10.7.4 with intel compilers v12.1 (icc,icpc,ifort) When I try to install the mpif90 wrapper, I have the error message : ifort: error #10104: unable to open '-commons' When I compare "mpif90 -showme" from version 1.5.4 and version 1.6, I find : /opt/openmpi-1.5.4/bin/mpif90 -showme /opt/intel/composerxe/bin/ifort -I/opt/openmpi-1.5.4/include -I/opt/openmpi-1.5.4/lib -L/opt/openmpi-1.5.4-v12/lib -lmpi_f90 -lmpi_f77 -lmpi /opt/openmpi-1.6/bin/mpif90 -showme /opt/intel/composerxe/bin/ifort -I/opt/openmpi-1.6/include -Wl,-commons,use_dylibs -I/opt/openmpi-1.6/lib -L/opt/openmpi-1.6/lib -lmpi_f90 -lmpi_f77 -lmpi -lm What are options -Wl,-commons,use_dylibs ? Can I remove it and How ? -- Christophe Peyret ONERA - DSNA - PS3A http://www.onera.fr/dsna/couplage-methodes-aeroacoustiques
Re: [OMPI users] openmpi-1.6 undefined reference
Add "libompitrace" to your enable-contrib-no-build list. There is likely a missing include in there, but you don't need that lib to run. We'll take a look at it. On May 23, 2012, at 12:58 AM, marco atzeri wrote: > I am trying to build openmpi-1.6 for cygwin with dynamic libs > > - > ./autogen.sh > cd build_dir > source_dir/configure \ > LDFLAGS="-Wl,--export-all-symbols -no-undefined" \ > --disable-mca-dso \ > --without-udapl \ > --enable-cxx-exceptions \ > --enable-mpi-threads \ > --enable-progress-threads \ > --with-threads=posix \ > --without-cs-fs \ > --enable-heterogeneous \ > --with-mpi-param_check=always \ > --enable-contrib-no-build=vt \ > --enable-mca-nobuild=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv > make > - > > the build stop here : > CCLD libompitrace.la > Creating library file: .libs/libompitrace.dll.a.libs/abort.o: In function > `MPI_Abort': > /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: > undefined reference to `_o mpi_mpi_comm_world' > /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: > undefined reference to `_P MPI_Comm_rank' > /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:33: > undefined reference to `_P MPI_Comm_get_name' > /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:38: > undefined reference to `_P MPI_Abort' > > I do not find "mpi_mpi_comm_world" defined in any of the > already built objects, except > > ./ompi/communicator/.libs/comm_init.o > 0200 C _ompi_mpi_comm_world > > and on libmpi.dll.a > > d002278.o: > i .idata$4 > i .idata$5 > i .idata$6 > i .idata$7 > t .text > U __head_cygmpi_1_dll > I __imp__ompi_mpi_comm_world > I __nm__ompi_mpi_comm_world > > > Hint ? > > Marco > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Shared Memory - Eager VS Rendezvous
I think I found the answer to my question on Jeff Squyres blog: http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/ However now I have a new question, how do I know if my machine uses the copyin/copyout mechanism or the direct mapping? Assuming that I am running on OpenMPI 1.5.x installed on top of a linux Kernel 2.6.32? cheers, Simone On 05/22/2012 05:29 PM, Simone Pellegrini wrote: Dear all, I would like to have a confirmation on the assumptions I have on how OpenMPI implements the rendezvous protocol for shared memory. If process A sends a message to process B and the eager protocol is used then I assume that the message is written into a shared memory area and picked up by the receiver when the receive operation is posted. When the rendezvous is utilized however the message still need to end up in the shared memory area somehow. I don't think any RDMA-like transfer exists for shared memory communications. Therefore you need to buffer this message somehow, however I assume that you don't buffer the whole thing but use some type of pipelined protocol so that you reduce the size of the buffer you need to keep in the shared memory. Is it completely wrong? It would be nice if someone could point me somewhere I can find more details about this. In the OpenMPI tuning page there are several details regarding the protocol utilized for IB but very little for SM. thanks in advance, Simone P.
Re: [OMPI users] MPI_COMPLEX16
On 05/23/2012 07:30 PM, Patrick Le Dot wrote: David Singletonwrites: I should have checked earlier - same for MPI_COMPLEX and MPI_COMPLEX8. David On 04/27/2012 08:43 AM, David Singleton wrote: Apologies if this has already been covered somewhere. One of our users has noticed that MPI_COMPLEX16 is flagged as an invalid type in 1.5.4 but not in 1.4.3 while MPI_DOUBLE_COMPLEX is accepted for both. This is with either gfortran or intel-fc. ... Hi, I hit the same problem : MPI_COMPLEX8 and MPI_COMPLEX16 were available in v1.4 but were removes in v1.5 and I don't understand why, except that this types are not into the standard... I have a patch to reintroduce them again so let me know what you think. I would very much appreciate seeing that patch. Thanks David
Re: [OMPI users] MPI_COMPLEX16
David Singleton anu.edu.au> writes: > > > I should have checked earlier - same for MPI_COMPLEX and MPI_COMPLEX8. > > David > > On 04/27/2012 08:43 AM, David Singleton wrote: > > > > Apologies if this has already been covered somewhere. One of our users > > has noticed that MPI_COMPLEX16 is flagged as an invalid type in 1.5.4 > > but not in 1.4.3 while MPI_DOUBLE_COMPLEX is accepted for both. This is > > with either gfortran or intel-fc. > > ... Hi, I hit the same problem : MPI_COMPLEX8 and MPI_COMPLEX16 were available in v1.4 but were removes in v1.5 and I don't understand why, except that this types are not into the standard... I have a patch to reintroduce them again so let me know what you think. Thanks, Patrick
[OMPI users] openmpi-1.6 undefined reference
I am trying to build openmpi-1.6 for cygwin with dynamic libs - ./autogen.sh cd build_dir source_dir/configure \ LDFLAGS="-Wl,--export-all-symbols -no-undefined" \ --disable-mca-dso \ --without-udapl \ --enable-cxx-exceptions \ --enable-mpi-threads \ --enable-progress-threads \ --with-threads=posix \ --without-cs-fs \ --enable-heterogeneous \ --with-mpi-param_check=always \ --enable-contrib-no-build=vt \ --enable-mca-nobuild=memory_mallopt,paffinity,installdirs-windows,timer-windows,shmem-sysv make - the build stop here : CCLD libompitrace.la Creating library file: .libs/libompitrace.dll.a.libs/abort.o: In function `MPI_Abort': /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: undefined reference to `_o mpi_mpi_comm_world' /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:32: undefined reference to `_P MPI_Comm_rank' /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:33: undefined reference to `_P MPI_Comm_get_name' /pub/devel/openmpi/openmpi-1.6-2/src/openmpi-1.6/ompi/contrib/libompitrace/abort.c:38: undefined reference to `_P MPI_Abort' I do not find "mpi_mpi_comm_world" defined in any of the already built objects, except ./ompi/communicator/.libs/comm_init.o 0200 C _ompi_mpi_comm_world and on libmpi.dll.a d002278.o: i .idata$4 i .idata$5 i .idata$6 i .idata$7 t .text U __head_cygmpi_1_dll I __imp__ompi_mpi_comm_world I __nm__ompi_mpi_comm_world Hint ? Marco
Re: [OMPI users] mpicc link shouldn't add -ldl and -lhwloc
On 05/22/2012 10:34 PM, Orion Poplawski wrote: On 05/21/2012 06:15 PM, Jeff Squyres wrote: On May 15, 2012, at 10:37 AM, Orion Poplawski wrote: $ mpicc -showme:link -pthread -m64 -L/usr/lib64/openmpi/lib -lmpi -ldl -lhwloc -ldl and -lhwloc should not be listed. The user should only link against libraries that they are using directly, namely -lmpi, and they should explicitly add -ldl and -lhwloc if their code directly uses those libraries. There does not appear to be any references to libdl or libhwloc symbols in the openmpi headers which is the other place it could come in. I just read this a few times, and I admit that I'm a little confused. libmpi does use symbols from libdl; we use it to load OMPI's plugins. So I'm not sure why you're saying we shouldn't -ldl in the wrapper compiler...? libhwloc might be a little questionable here -- I'll have to check to see whether 1.6 uses hwloc only in a plugin or whether it's used in the base library (I don't remember offhand). But libmpi is already linked against libdl and libhwloc. The wrapper libraries are added when linking user code. But unless a user's code directly uses libdl or libhwloc they don't need to link to those libraries. I should add the caveat that they are need when linking statically, but not when using shared libraries. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA DivisionFAX: 303-415-9702 3380 Mitchell Lane or...@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com
Re: [OMPI users] mpicc link shouldn't add -ldl and -lhwloc
On 05/21/2012 06:15 PM, Jeff Squyres wrote: On May 15, 2012, at 10:37 AM, Orion Poplawski wrote: $ mpicc -showme:link -pthread -m64 -L/usr/lib64/openmpi/lib -lmpi -ldl -lhwloc -ldl and -lhwloc should not be listed. The user should only link against libraries that they are using directly, namely -lmpi, and they should explicitly add -ldl and -lhwloc if their code directly uses those libraries. There does not appear to be any references to libdl or libhwloc symbols in the openmpi headers which is the other place it could come in. I just read this a few times, and I admit that I'm a little confused. libmpi does use symbols from libdl; we use it to load OMPI's plugins. So I'm not sure why you're saying we shouldn't -ldl in the wrapper compiler...? libhwloc might be a little questionable here -- I'll have to check to see whether 1.6 uses hwloc only in a plugin or whether it's used in the base library (I don't remember offhand). But libmpi is already linked against libdl and libhwloc. The wrapper libraries are added when linking user code. But unless a user's code directly uses libdl or libhwloc they don't need to link to those libraries. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA DivisionFAX: 303-415-9702 3380 Mitchell Lane or...@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com