Re: [OMPI devel] 1.7rc8 is posted
On Wednesday 27 February 2013 17:52:42 Jeff Squyres wrote: > The goal is to release 1.7 (final) by the end of this week. New rc posted Built on CentOS-6.3 + MLNXOFED-1.5.3-310 using intel-13.1.0: OK Built with cma: OK Built with mxm (1.5.0eb2be5): OK Built with slurm: ok Launch correctness: OK IMB on 32 and 128 ranks all combos of mxm, verbs, cma, no cma: OK(*) (*) of course there are lots of spots-o-performance-sucking Good luck with 1.7 final, Peter > with fairly small changes: > > http://www.open-mpi.org/software/ompi/v1.7/ > > - Fix wrong header file / compilation error in bcol > - Support MXM STREAM for isend and irecv > - Make sure "mpirun " fails with $status!=0 > - Bunches of cygwin minor fixes > - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 > bindings - Fix --disable-mpi-io with the F08 bindings signature.asc Description: This is a digitally signed message part.
Re: [OMPI devel] 1.7rc8 is posted
Regarding the Open64. It's that strange thing that I need two different versions of the compilers: one for Opteron family 15h and one for general x86-64 architecture. This makes things quite complicated since my head node doesn't have Opteron family 15h processor. You can have a look at this topic: http://devgurus.amd.com/thread/160180 I've tried building the same version on a node with 6380 processors. Configuration was successful. But make failed in the following way: libtool: compile: opencc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc "-DLT_CONFIG_H=" -DLTDL -I. -I. -Ilibltdl -I./libltdl -I./libltdl -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/hwloc/hwloc151/hwloc/include -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent/include -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -O3 -DNDEBUG -fvisibility=hidden -MT libltdlc_la-ltdl.lo -MD -MP -MF .deps/libltdlc_la-ltdl.Tpo -c ltdl.c -fPIC -DPIC -o .libs/libltdlc_la-ltdl.o /tmp/ccspin#.cVv00f.s: Assembler messages: /tmp/ccspin#.cVv00f.s:860: Error: no such instruction: `bextr $257,%esi,%esi' /tmp/ccspin#.cVv00f.s:868: Error: no such instruction: `bextr $258,%edi,%edi' /tmp/ccspin#.cVv00f.s:876: Error: no such instruction: `bextr $259,%eax,%eax' make[3]: *** [libltdlc_la-ltdl.lo] Error 1 make[3]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl' make[2]: *** [all] Error 2 make[2]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal' make: *** [all-recursive] Error 1 I guess that this issue has more to do with the compiler than with OpenMPI. Let me know if you need me to run any additional tests. Regards, Pavel Mezentsev. 2013/2/28 Jeff Squyres (jsquyres)> On Feb 28, 2013, at 12:04 PM, Pavel Mezentsev > wrote: > > > Do you mean the logs from failed attempts? They are enclosed. If you > need the successful logs I'll need to make them again since the files from > successful builds are deleted. > > You guessed right; I need the logs from the failed builds. > > It looks like your openf95 compiler is generating borked executables: > > - > configure:31019: checking KIND value of Fortran C_SIGNED_CHAR > configure:31046: openf95 -o conftestconftest.f90 >&5 > configure:31046: $? = 0 > configure:31046: ./conftest > ./configure: line 4343: 1234 Illegal instruction (core dumped) > ./conftest$ac_exeext > configure:31046: $? = 132 > configure: program exited with status 132 > configure: failed program was: > | program main > | > | use, intrinsic :: ISO_C_BINDING > | open(unit = 7, file = "conftest.out") > | write(7, *) C_SIGNED_CHAR > | close(7) > | > | end > - > > There's no reason the above Fortran program should fail with "illegal > instruction". > > > I am not using MXM. The results with the option you suggested are the > same as before: > > We're investigating the latency issue. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] 1.7rc8 is posted
On Feb 28, 2013, at 12:04 PM, Pavel Mezentsevwrote: > Do you mean the logs from failed attempts? They are enclosed. If you need the > successful logs I'll need to make them again since the files from successful > builds are deleted. You guessed right; I need the logs from the failed builds. It looks like your openf95 compiler is generating borked executables: - configure:31019: checking KIND value of Fortran C_SIGNED_CHAR configure:31046: openf95 -o conftestconftest.f90 >&5 configure:31046: $? = 0 configure:31046: ./conftest ./configure: line 4343: 1234 Illegal instruction (core dumped) ./conftest$ac_exeext configure:31046: $? = 132 configure: program exited with status 132 configure: failed program was: | program main | | use, intrinsic :: ISO_C_BINDING | open(unit = 7, file = "conftest.out") | write(7, *) C_SIGNED_CHAR | close(7) | | end - There's no reason the above Fortran program should fail with "illegal instruction". > I am not using MXM. The results with the option you suggested are the same > as before: We're investigating the latency issue. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.7rc8 is posted
Do you mean the logs from failed attempts? They are enclosed. If you need the successful logs I'll need to make them again since the files from successful builds are deleted. I am not using MXM. The results with the option you suggested are the same as before: #--- # Benchmarking PingPong # #processes = 2 #--- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.49 0.00 1 1000 1.58 0.61 2 1000 1.12 1.71 4 1000 1.10 3.48 8 1000 1.11 6.90 16 1000 1.1113.69 32 1000 1.1227.21 64 1000 1.1652.52 128 1000 1.7270.83 256 1000 1.84 132.72 512 1000 1.99 245.74 1024 1000 2.25 433.92 2048 1000 2.87 680.54 4096 1000 3.52 1109.13 8192 1000 4.68 1670.60 16384 1000 9.66 1617.91 32768 100014.30 2185.24 65536 64023.45 2665.33 131072 32035.99 3473.15 262144 16058.05 4306.65 524288 80 101.94 4904.69 1048576 40 188.65 5300.86 2097152 20 526.05 3801.94 4194304 10 1096.09 3649.32 #--- # Benchmarking PingPing # #processes = 2 #--- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.10 0.00 1 1000 1.24 0.77 2 1000 1.23 1.55 4 1000 1.23 3.10 8 1000 1.25 6.09 16 1000 1.1413.41 32 1000 1.1127.40 64 1000 1.1652.75 128 1000 1.7171.34 256 1000 1.84 132.33 512 1000 1.98 246.63 1024 1000 2.27 429.26 2048 1000 2.91 672.30 4096 1000 3.52 1109.43 8192 1000 4.80 1627.25 16384 1000 9.98 1565.64 32768 100014.70 2125.14 65536 64024.18 2584.97 131072 32037.33 3348.95 262144 16060.59 4125.82 524288 80 105.83 4724.78 1048576 40 197.82 5055.05 2097152 20 791.35 2527.34 4194304 10 1820.30 2197.44 Regards, Pavel Mezentsev. 2013/2/28 Jeff Squyres (jsquyres)> On Feb 27, 2013, at 7:36 PM, Pavel Mezentsev > wrote: > > > I've tried the new rc. Here is what I got: > > Thanks for testing. > > > 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've > failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems > are in the fortran part. In each case I've used the following configuration > line: > > CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix > --with-knem=$knem_path > > Open64 failed during configuration with the following: > > *** Fortran compiler > > checking whether we are using the GNU Fortran compiler... yes > > checking whether openf95 accepts -g... yes > > configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment > variables; only the FC and FCFLAGS environment variables are used. > > checking whether ln -s works... yes > > checking if Fortran compiler works... yes > > checking for extra arguments to build a shared library... none needed > > checking for Fortran flag to compile .f files... none > > checking for Fortran flag to compile .f90 files... none > > checking to see if Fortran compilers need additional linker flags... none > > checking external symbol convention... double underscore > > checking if C and Fortran are link compatible... yes > > checking to see if Fortran compiler likes the C++ exception flags... > skipped (no C++ exceptions flags) > > checking to see if mpifort compiler needs additional linker flags... none > > checking if Fortran compiler supports CHARACTER... yes > > checking size of Fortran CHARACTER... 1 > > checking for C type corresponding to CHARACTER... char > > checking alignment of Fortran CHARACTER... 1 > >
Re: [OMPI devel] 1.7rc8 is posted
On Feb 27, 2013, at 7:36 PM, Pavel Mezentsevwrote: > I've tried the new rc. Here is what I got: Thanks for testing. > 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've failed > while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems are in > the fortran part. In each case I've used the following configuration line: > CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix > --with-knem=$knem_path > Open64 failed during configuration with the following: > *** Fortran compiler > checking whether we are using the GNU Fortran compiler... yes > checking whether openf95 accepts -g... yes > configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment > variables; only the FC and FCFLAGS environment variables are used. > checking whether ln -s works... yes > checking if Fortran compiler works... yes > checking for extra arguments to build a shared library... none needed > checking for Fortran flag to compile .f files... none > checking for Fortran flag to compile .f90 files... none > checking to see if Fortran compilers need additional linker flags... none > checking external symbol convention... double underscore > checking if C and Fortran are link compatible... yes > checking to see if Fortran compiler likes the C++ exception flags... skipped > (no C++ exceptions flags) > checking to see if mpifort compiler needs additional linker flags... none > checking if Fortran compiler supports CHARACTER... yes > checking size of Fortran CHARACTER... 1 > checking for C type corresponding to CHARACTER... char > checking alignment of Fortran CHARACTER... 1 > checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR > checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback > checking Fortran value of selected_int_kind(4)... no > configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR > configure: WARNING: See config.log for more details > configure: error: Cannot continue Please send the full configure output as well as the config.log file (please compress). > Ekopath failed during make with the following error: > PPFC mpi-f08-sizeof.lo > PPFC mpi-f08.lo > In file included from mpi-f08.F90:37: > mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif directive > mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif directive > In file included from mpi-f08.F90:38: > pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif > directive > pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif > directive > pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File = > mpi-f-interfaces-bind.h, Line = 955, Column = 29 > Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit() I've pinged Pathscale about this. > 2) I've ran a couple of tests (IMB) with the new version. I ran this on a > system consisting of 10 nodes with Intel SB processor and fdr ConnectX3 > infiniband adapters. > First I've tried the following parameters: > mpirun -np $NP -hostfile hosts --mca btl openib,sm,self --bind-to-core > -npernode 16 --mca mpi_leave_pinned 1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL > This combination complained about mca_leave_pinned. The same line works for > 1.6.3. Is something different in the new release and I've missed it? > -- > A process attempted to use the "leave pinned" MPI feature, but no > memory registration hooks were found on the system at run time. This > may be the result of running on a system that does not support memory > hooks or having some other software subvert Open MPI's use of the > memory hooks. You can disable Open MPI's use of memory hooks by > setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA > parameters to 0. This should not be, and might explain your lower performance on the IMB results. Nathan -- you reported that you saw something like this before, but were then unable to reproduce. Any ideas what's going on here? Mellanox? (although the short message latency is troubling...) Can you ensure that you aren't using MXM in 1.7? I understand that its short message latency is worse than RC verbs. You'll need to add "--mca pml ob1" in your command line. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.7rc8 is posted
Sweet -- thanks for the quick test! On Feb 27, 2013, at 7:54 PM, marco atzeriwrote: > On 2/27/2013 6:52 PM, Jeff Squyres (jsquyres) wrote: >> The goal is to release 1.7 (final) by the end of this week. New rc posted >> with fairly small changes: >> >> http://www.open-mpi.org/software/ompi/v1.7/ >> >> - Fix wrong header file / compilation error in bcol >> - Support MXM STREAM for isend and irecv >> - Make sure "mpirun " fails with $status!=0 >> - Bunches of cygwin minor fixes >> - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 >> bindings >> - Fix --disable-mpi-io with the F08 bindings >> > > builds and passes tests on cygwin > > Regards > Marco > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.7rc8 is posted
On 2/27/2013 6:52 PM, Jeff Squyres (jsquyres) wrote: The goal is to release 1.7 (final) by the end of this week. New rc posted with fairly small changes: http://www.open-mpi.org/software/ompi/v1.7/ - Fix wrong header file / compilation error in bcol - Support MXM STREAM for isend and irecv - Make sure "mpirun " fails with $status!=0 - Bunches of cygwin minor fixes - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 bindings - Fix --disable-mpi-io with the F08 bindings builds and passes tests on cygwin Regards Marco
Re: [OMPI devel] 1.7rc8 is posted
I've tried the new rc. Here is what I got: 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems are in the fortran part. In each case I've used the following configuration line: CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix --with-knem=$knem_path Open64 failed during configuration with the following: *** Fortran compiler checking whether we are using the GNU Fortran compiler... yes checking whether openf95 accepts -g... yes configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment variables; only the FC and FCFLAGS environment variables are used. checking whether ln -s works... yes checking if Fortran compiler works... yes checking for extra arguments to build a shared library... none needed checking for Fortran flag to compile .f files... none checking for Fortran flag to compile .f90 files... none checking to see if Fortran compilers need additional linker flags... none checking external symbol convention... double underscore checking if C and Fortran are link compatible... yes checking to see if Fortran compiler likes the C++ exception flags... skipped (no C++ exceptions flags) checking to see if mpifort compiler needs additional linker flags... none checking if Fortran compiler supports CHARACTER... yes checking size of Fortran CHARACTER... 1 checking for C type corresponding to CHARACTER... char checking alignment of Fortran CHARACTER... 1 checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback checking Fortran value of selected_int_kind(4)... no configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR configure: WARNING: See config.log for more details configure: error: Cannot continue Ekopath failed during make with the following error: PPFC mpi-f08-sizeof.lo PPFC mpi-f08.lo In file included from mpi-f08.F90:37: mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif directive mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif directive In file included from mpi-f08.F90:38: pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif directive pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif directive pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File = mpi-f-interfaces-bind.h, Line = 955, Column = 29 Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit() make[2]: *** [mpi-f08.lo] Error 1 make[2]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi/mpi/fortran/use-mpi-f08' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi' make: *** [all-recursive] Error 1 It seems to be different from the error I got last time with rc7. And again I'm not a fortran guy to understand this error. I've used the following version of the compiler: http://c591116.r16.cf2.rackcdn.com/ekopath/nightly/Linux/ekopath-2013-02-26-installer.run 2) I've ran a couple of tests (IMB) with the new version. I ran this on a system consisting of 10 nodes with Intel SB processor and fdr ConnectX3 infiniband adapters. First I've tried the following parameters: mpirun -np $NP -hostfile hosts --mca btl openib,sm,self --bind-to-core -npernode 16 --mca mpi_leave_pinned 1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL This combination complained about mca_leave_pinned. The same line works for 1.6.3. Is something different in the new release and I've missed it? -- A process attempted to use the "leave pinned" MPI feature, but no memory registration hooks were found on the system at run time. This may be the result of running on a system that does not support memory hooks or having some other software subvert Open MPI's use of the memory hooks. You can disable Open MPI's use of memory hooks by setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters to 0. Open MPI will disable any transports that are attempting to use the leave pinned functionality; your job may still run, but may fall back to a slower network transport (such as TCP). Mpool name: grdma Process:[[13305,1],1] Local host: b23 -- -- WARNING: There is at least one OpenFabrics device found but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job. Local host: b23 -- -- At least one pair of MPI processes are unable to reach each other for MPI communications.
[OMPI devel] 1.7rc8 is posted
The goal is to release 1.7 (final) by the end of this week. New rc posted with fairly small changes: http://www.open-mpi.org/software/ompi/v1.7/ - Fix wrong header file / compilation error in bcol - Support MXM STREAM for isend and irecv - Make sure "mpirun " fails with $status!=0 - Bunches of cygwin minor fixes - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 bindings - Fix --disable-mpi-io with the F08 bindings -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/