Re: [OMPI devel] 1.7rc8 is posted
Regarding the Open64. It's that strange thing that I need two different versions of the compilers: one for Opteron family 15h and one for general x86-64 architecture. This makes things quite complicated since my head node doesn't have Opteron family 15h processor. You can have a look at this topic: http://devgurus.amd.com/thread/160180 I've tried building the same version on a node with 6380 processors. Configuration was successful. But make failed in the following way: libtool: compile: opencc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc "-DLT_CONFIG_H=" -DLTDL -I. -I. -Ilibltdl -I./libltdl -I./libltdl -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/hwloc/hwloc151/hwloc/include -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent -I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent/include -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -I/usr/include/infiniband -O3 -DNDEBUG -fvisibility=hidden -MT libltdlc_la-ltdl.lo -MD -MP -MF .deps/libltdlc_la-ltdl.Tpo -c ltdl.c -fPIC -DPIC -o .libs/libltdlc_la-ltdl.o /tmp/ccspin#.cVv00f.s: Assembler messages: /tmp/ccspin#.cVv00f.s:860: Error: no such instruction: `bextr $257,%esi,%esi' /tmp/ccspin#.cVv00f.s:868: Error: no such instruction: `bextr $258,%edi,%edi' /tmp/ccspin#.cVv00f.s:876: Error: no such instruction: `bextr $259,%eax,%eax' make[3]: *** [libltdlc_la-ltdl.lo] Error 1 make[3]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl' make[2]: *** [all] Error 2 make[2]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal' make: *** [all-recursive] Error 1 I guess that this issue has more to do with the compiler than with OpenMPI. Let me know if you need me to run any additional tests. Regards, Pavel Mezentsev. 2013/2/28 Jeff Squyres (jsquyres) <jsquy...@cisco.com> > On Feb 28, 2013, at 12:04 PM, Pavel Mezentsev <pavel.mezent...@gmail.com> > wrote: > > > Do you mean the logs from failed attempts? They are enclosed. If you > need the successful logs I'll need to make them again since the files from > successful builds are deleted. > > You guessed right; I need the logs from the failed builds. > > It looks like your openf95 compiler is generating borked executables: > > - > configure:31019: checking KIND value of Fortran C_SIGNED_CHAR > configure:31046: openf95 -o conftestconftest.f90 >&5 > configure:31046: $? = 0 > configure:31046: ./conftest > ./configure: line 4343: 1234 Illegal instruction (core dumped) > ./conftest$ac_exeext > configure:31046: $? = 132 > configure: program exited with status 132 > configure: failed program was: > | program main > | > | use, intrinsic :: ISO_C_BINDING > | open(unit = 7, file = "conftest.out") > | write(7, *) C_SIGNED_CHAR > | close(7) > | > | end > - > > There's no reason the above Fortran program should fail with "illegal > instruction". > > > I am not using MXM. The results with the option you suggested are the > same as before: > > We're investigating the latency issue. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] 1.7rc8 is posted
Do you mean the logs from failed attempts? They are enclosed. If you need the successful logs I'll need to make them again since the files from successful builds are deleted. I am not using MXM. The results with the option you suggested are the same as before: #--- # Benchmarking PingPong # #processes = 2 #--- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.49 0.00 1 1000 1.58 0.61 2 1000 1.12 1.71 4 1000 1.10 3.48 8 1000 1.11 6.90 16 1000 1.1113.69 32 1000 1.1227.21 64 1000 1.1652.52 128 1000 1.7270.83 256 1000 1.84 132.72 512 1000 1.99 245.74 1024 1000 2.25 433.92 2048 1000 2.87 680.54 4096 1000 3.52 1109.13 8192 1000 4.68 1670.60 16384 1000 9.66 1617.91 32768 100014.30 2185.24 65536 64023.45 2665.33 131072 32035.99 3473.15 262144 16058.05 4306.65 524288 80 101.94 4904.69 1048576 40 188.65 5300.86 2097152 20 526.05 3801.94 4194304 10 1096.09 3649.32 #--- # Benchmarking PingPing # #processes = 2 #--- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.10 0.00 1 1000 1.24 0.77 2 1000 1.23 1.55 4 1000 1.23 3.10 8 1000 1.25 6.09 16 1000 1.1413.41 32 1000 1.1127.40 64 1000 1.1652.75 128 1000 1.7171.34 256 1000 1.84 132.33 512 1000 1.98 246.63 1024 1000 2.27 429.26 2048 1000 2.91 672.30 4096 1000 3.52 1109.43 8192 1000 4.80 1627.25 16384 1000 9.98 1565.64 32768 100014.70 2125.14 65536 64024.18 2584.97 131072 32037.33 3348.95 262144 16060.59 4125.82 524288 80 105.83 4724.78 1048576 40 197.82 5055.05 2097152 20 791.35 2527.34 4194304 10 1820.30 2197.44 Regards, Pavel Mezentsev. 2013/2/28 Jeff Squyres (jsquyres) <jsquy...@cisco.com> > On Feb 27, 2013, at 7:36 PM, Pavel Mezentsev <pavel.mezent...@gmail.com> > wrote: > > > I've tried the new rc. Here is what I got: > > Thanks for testing. > > > 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've > failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems > are in the fortran part. In each case I've used the following configuration > line: > > CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix > --with-knem=$knem_path > > Open64 failed during configuration with the following: > > *** Fortran compiler > > checking whether we are using the GNU Fortran compiler... yes > > checking whether openf95 accepts -g... yes > > configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment > variables; only the FC and FCFLAGS environment variables are used. > > checking whether ln -s works... yes > > checking if Fortran compiler works... yes > > checking for extra arguments to build a shared library... none needed > > checking for Fortran flag to compile .f files... none > > checking for Fortran flag to compile .f90 files... none > > checking to see if Fortran compilers need additional linker flags... none > > checking external symbol convention... double underscore > > checking if C and Fortran are link compatible... yes > > checking to see if Fortran compiler likes the C++ exception flags... > skipped (no C++ exceptions flags) > > checking to see if mpifort compiler needs additional linker flags... none > > checking if Fortran compiler supports CHARAC
Re: [OMPI devel] 1.7rc8 is posted
4.30 2795.45 2672.58 4194304 10 5185.79 5451.20 5298.98 This time I only ran the test on 160 processes but before I've done more testing with 1.6 on different number of processes (from 16 to 320) and those tuned parameters helped almost each time. I don't know what are default parameters tuned for but perhaps it may be a good idea to change the defaults for the kind of system I use. I can perform some additional tests if necessary or give more information on the problems that I've came across. Regards, Pavel Mezentsev. 2013/2/27 Jeff Squyres (jsquyres) <jsquy...@cisco.com> > The goal is to release 1.7 (final) by the end of this week. New rc posted > with fairly small changes: > > http://www.open-mpi.org/software/ompi/v1.7/ > > - Fix wrong header file / compilation error in bcol > - Support MXM STREAM for isend and irecv > - Make sure "mpirun " fails with $status!=0 > - Bunches of cygwin minor fixes > - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 > bindings > - Fix --disable-mpi-io with the F08 bindings > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] v1.7.0rc7
I've tried to build it but got different errors with different compilers. With Intel (2011.5.220) and pgi (13.2) I get the following error: CC bcol_iboffload_module.lo bcol_iboffload_module.c(37): catastrophic error: cannot open source file "ompi/mca/common/netpatterns/common_netpatterns.h" #include "ompi/mca/common/netpatterns/common_netpatterns.h" I failed to find that file anywhere among the sources. With pathscale (4.0.12.1) I get the following: PPFC mpi-f08-interfaces-callbacks.lo module mpi_f08_interfaces_callbacks ^ pathf95-855 pathf95: ERROR MPI_F08_INTERFACES_CALLBACKS, File = mpi-f08-interfaces-callbacks.F90, Line = 9, Column = 8 The compiler has detected errors in module "MPI_F08_INTERFACES_CALLBACKS". No module information file will be created for this module. attribute_val_in,attribute_val_out,flag,ierror) & ^ pathf95-1691 pathf95: ERROR MPI_COMM_COPY_ATTR_FUNCTION, File = mpi-f08-interfaces-callbacks.F90, Line = 66, Column = 75 For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C) attribute_val_in,attribute_val_out,flag,ierror) & ^ pathf95-1691 pathf95: ERROR MPI_WIN_COPY_ATTR_FUNCTION, File = mpi-f08-interfaces-callbacks.F90, Line = 91, Column = 74 For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C) attribute_val_in,attribute_val_out,flag,ierror) & ^ pathf95-1691 pathf95: ERROR MPI_TYPE_COPY_ATTR_FUNCTION, File = mpi-f08-interfaces-callbacks.F90, Line = 116, Column = 75 For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C) SUBROUTINE MPI_Grequest_cancel_function(extra_state,complete,ierror) & ^ pathf95-1691 pathf95: ERROR MPI_GREQUEST_CANCEL_FUNCTION, File = mpi-f08-interfaces-callbacks.F90, Line = 195, Column = 53 For "COMPLETE", LOGICAL(KIND=4) not allowed with BIND(C) pathf95: PathScale(TM) Fortran Version 4.0.12.1 (f14) Tue Feb 26, 2013 06:33:40 pathf95: 429 source lines pathf95: 5 Error(s), 0 Warning(s), 0 Other message(s), 0 ANSI(s) pathf95: "explain pathf95-message number" gives more information about each message make[2]: *** [mpi-f08-interfaces-callbacks.lo] Error 1 make[2]: Leaving directory `/tmp/mpi_install_tmp21558/openmpi-1.7rc7/ompi/mpi/fortran/base' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/mpi_install_tmp21558/openmpi-1.7rc7/ompi' make: *** [all-recursive] Error 1 I am not a fortran guy and don't really know what's the problem here. I tried configuring all cases only with setting the compilers in the environment variables and setting --prefix. I managed to build 1.6.3 using all 3 mentioned compilers with the same configuration lines without any errors. Not sure about the problem with pathscale but the first problem seems to be a real error. Or did I miss something? Regards, Pavel Mezentsev. 2013/2/26 Ralph Castain <r...@open-mpi.org> > > On Feb 25, 2013, at 1:40 PM, marco atzeri <marco.atz...@gmail.com> wrote: > > > On 2/23/2013 11:45 PM, Ralph Castain wrote: > >> This release candidate is the last one we expect to have before > release, so please test it. Can be downloaded from the usual place: > >> > >> http://www.open-mpi.org/software/ompi/v1.7/ > >> > >> Latest changes include: > >> > >> * update of the alps/lustre configure code > >> * fixed solaris hwloc code > >> * various mxm updates > >> * removed java bindings (delayed until later release) > >> * improved the --report-bindings output > >> * a variety of minor cleanups > >> > > > > any reason to not include the cygwin patches added to 1.6.4 ? > > I don't believe they were ever CMR'd for 1.7.0, so they were never moved > > > > > Marco > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] algorithm selection in open mpi
Is there a way to specify collective depending on the size of the message and number of processes? Regards, Pavel Mezentsev 2012/4/3 George Bosilca <bosi...@eecs.utk.edu> > Roswan, > > There a re simpler solutions to achieve this. We have a built-in mechanism > to select a specific collective implementation. Here is what you have to > add in your .openmpi/mca-params.conf (or as MCA argument on the command > line): > > coll_tuned_use_dynamic_rules = 1 > coll_tuned_bcast_algorithm = 6 > > The first one activate the dynamic selection of collective algorithms, > while the second one force all broadcast to be of the type 6 (binomial > tree). Btw, once you set the first one, do a quick "ompi_info --param coll > tuned" to see the list of all possible options for the collective algorithm > selection. > > george. > > On Apr 2, 2012, at 23:10 , roswan ismail wrote: > > Hi all.. > > I am Roswan Ismail from Malaysia. I am focusing on MPI communication > performance on quad-core cluster at my university. I used Open MPI-1.4.3 > and measurements were done using scampi benchmark. > > As I know, open MPI used multiple algorithms to broadcast data (MPI_BCAST) > such as binomial, pipeline, binary tree, basic linear and split binary > tree. All these algorithms will be used based on message size and > communicator size. For example, binomial is used when message size to be > broadcasted is small while pipeline used for broadcasting a large message. > > What I want to do now is, to use fixed algorithm i.e binomial for all > message size. I want to see and compare the results with the default > results. So, I was modified coll_tuned_decision_fixed.c which is located in > open mpi-1.4.3/ompi/mca/coll/tuned by returning binomial algorithm for all > condition. Then I recompile the files but the problem is, the results > obtained is same as default. It seems I do not do any changes to the codes. > > So could you guys tell me the right way to do that. > > Many thanks > > ** > *Roswan Binti Ismail, > FTMK, > Univ. Pend. Sultan Idris, > Tg Malim, Perak. > Pej: 05-4505173 > H/P: 0123588047 > iewa...@gmail.com <iewanis1402@hotmail> > ros...@ftmk.upsi.edu.my > *** > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] barrier problem
I took the best result from each version, that's why different algotithm numbers were chosen. I've studied the matter a bit further and here's what I got: with openmpi 1.5.4 these are the average times: /opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile hosts_all2all_4 -npernode 32 --mca btl openib,sm,self -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm $i -np 128 openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 128 barrier 0 - 71.78 3 - 69.39 6 - 69.05 If I pin the processes with the following script: #!/bin/bash s=$(($OMPI_COMM_WORLD_NODE_RANK)) numactl --physcpubind=$((s)) --localalloc openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 128 barrier then the results improve: 0 - 51.96 3 - 52.39 6 - 28.64 On openmpi-1.5.5rc3 without any binding the results are awful (14964.15 is the best) If I use the '--bind-to-core' flag then the results are almost the same as in 1.5.4 with binding script: 0 - 52.85 3 - 52.69 6 - 23.34 So almost everything seems to work fine now. The only problem left is that algorithm number 5 hangs 2012/3/28 Jeffrey Squyres <jsquy...@cisco.com> > FWIW: > > 1. There were definitely some issues with binding to cores and process > layouts on Opterons that should be fixed in the 1.5.5 that was finally > released today. > > 2. It is strange that the performance of barrier is so much different > between 1.5.4 and 1.5.5. Is there a reason you were choosing different > algorithm numbers between the two? (one of your command lines had > "coll_tuned_barrier_algorithm 1", the other had > "coll_tuned_barrier_algorithm 3"). > > > On Mar 23, 2012, at 10:11 AM, Shamis, Pavel wrote: > > > Pavel, > > > > Mvapich implements multicore optimized collectives, which perform > substantially better than default algorithms. > > FYI, ORNL team works on new high performance collectives framework for > OMPI. The framework provides significant boost in collectives performance. > > > > Regards, > > > > Pavel (Pasha) Shamis > > --- > > Application Performance Tools Group > > Computer Science and Math Division > > Oak Ridge National Laboratory > > > > > > > > > > > > > > On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote: > > > > I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's > why I didn't use --bind-to-core. I checked and the usage of --bind-to-core > improved the result comparing to 1.5.4: > > #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 100084.9685.0885.02 > > > > So I guess with 1.5.5 the processes move from core to core within node > even though I use all cores, right? Then why 1.5.4 behaves differently? > > > > I need --bind-to-core in some cases and that's why I need 1.5.5rc3 > instead of more stable 1.5.4. I know that I can use numactl explicitly but > --bind-to-core is more convinient :) > > > > 2012/3/23 Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>> > > I don't see where you told OMPI to --bind-to-core. We don't > automatically bind, so you have to explicitly tell us to do so. > > > > On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote: > > > >> Hello > >> > >> I'm doing some testing with IMB and dicovered a strange thing: > >> > >> Since I have a system with new AMD opteron 6276 processors I'm using > 1.5.5rc3 since it supports binding to cores. > >> > >> But when I run the barrier test form intel mpi benchmarks, the best I > get is: > >> #repetitions t_min[usec] t_max[usec] t_avg[usec] > >> 598 15159.56 15211.05 15184.70 > >> (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1 > -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca > coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 > openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 > barrier) > >> > >> And with openmpi 1.5.4 the result is much better: > >> #repetitions t_min[usec] t_max[usec] t_avg[usec] > >> 1000 113.23 113.33 113.28 > >> > >> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile > hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca > coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 > openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 > barrier) > >> > >> and still I couldn't come close to the result I got with mvapich: > >> #repetitions t_min[usec] t_max[usec] t_avg[
Re: [OMPI devel] barrier problem
I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why I didn't use --bind-to-core. I checked and the usage of --bind-to-core improved the result comparing to 1.5.4: #repetitions t_min[usec] t_max[usec] t_avg[usec] 100084.9685.0885.02 So I guess with 1.5.5 the processes move from core to core within node even though I use all cores, right? Then why 1.5.4 behaves differently? I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead of more stable 1.5.4. I know that I can use numactl explicitly but --bind-to-core is more convinient :) 2012/3/23 Ralph Castain <r...@open-mpi.org> > I don't see where you told OMPI to --bind-to-core. We don't automatically > bind, so you have to explicitly tell us to do so. > > On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote: > > > Hello > > > > I'm doing some testing with IMB and dicovered a strange thing: > > > > Since I have a system with new AMD opteron 6276 processors I'm using > 1.5.5rc3 since it supports binding to cores. > > > > But when I run the barrier test form intel mpi benchmarks, the best I > get is: > > #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 598 15159.56 15211.05 15184.70 > > (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1 > -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca > coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 > openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 > barrier) > > > > And with openmpi 1.5.4 the result is much better: > > #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 1000 113.23 113.33 113.28 > > > > (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile > hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca > coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 > openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 > barrier) > > > > and still I couldn't come close to the result I got with mvapich: > > #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 100017.5117.5317.53 > > > > (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 > -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 > -off_cache 16,64 -msglog 1:16 -npmin 256 barrier) > > > > I dunno if this is a bug or me doing something not the way I should. So > is there a way to improve my results? > > > > Best regards, > > Pavel Mezentsev > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] barrier problem
Hello I'm doing some testing with IMB and dicovered a strange thing: Since I have a system with new AMD opteron 6276 processors I'm using 1.5.5rc3 since it supports binding to cores. But when I run the barrier test form intel mpi benchmarks, the best I get is: #repetitions t_min[usec] t_max[usec] t_avg[usec] 598 15159.56 15211.05 15184.70 (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier) And with openmpi 1.5.4 the result is much better: #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 113.23 113.33 113.28 (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier) and still I couldn't come close to the result I got with mvapich: #repetitions t_min[usec] t_max[usec] t_avg[usec] 100017.5117.5317.53 (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier) I dunno if this is a bug or me doing something not the way I should. So is there a way to improve my results? Best regards, Pavel Mezentsev