Re: [OMPI devel] 1.7rc8 is posted

2013-02-28 Thread Pavel Mezentsev
Regarding the Open64. It's that strange thing that I need two different
versions of the compilers:
one for Opteron family 15h and one for general x86-64 architecture. This
makes things quite complicated since my head node doesn't have Opteron
family 15h processor. You can have a look at this topic:
http://devgurus.amd.com/thread/160180

I've tried building the same version on a node with 6380 processors.
Configuration was successful. But make failed in the following way:
libtool: compile:  opencc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc
"-DLT_CONFIG_H=" -DLTDL -I. -I. -Ilibltdl -I./libltdl -I./libltdl
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/hwloc/hwloc151/hwloc/include
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent/include
-I/usr/include/infiniband -I/usr/include/infiniband
-I/usr/include/infiniband -I/usr/include/infiniband
-I/usr/include/infiniband -I/usr/include/infiniband -O3 -DNDEBUG
-fvisibility=hidden -MT libltdlc_la-ltdl.lo -MD -MP -MF
.deps/libltdlc_la-ltdl.Tpo -c ltdl.c  -fPIC -DPIC -o
.libs/libltdlc_la-ltdl.o
/tmp/ccspin#.cVv00f.s: Assembler messages:
/tmp/ccspin#.cVv00f.s:860: Error: no such instruction: `bextr
$257,%esi,%esi'
/tmp/ccspin#.cVv00f.s:868: Error: no such instruction: `bextr
$258,%edi,%edi'
/tmp/ccspin#.cVv00f.s:876: Error: no such instruction: `bextr
$259,%eax,%eax'
make[3]: *** [libltdlc_la-ltdl.lo] Error 1
make[3]: Leaving directory
`/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal'
make: *** [all-recursive] Error 1

I guess that this issue has more to do with the compiler than with OpenMPI.


Let me know if you need me to run any additional tests.

Regards, Pavel Mezentsev.



2013/2/28 Jeff Squyres (jsquyres) <jsquy...@cisco.com>

> On Feb 28, 2013, at 12:04 PM, Pavel Mezentsev <pavel.mezent...@gmail.com>
> wrote:
>
> > Do you mean the logs from failed attempts? They are enclosed. If you
> need the successful logs I'll need to make them again since the files from
> successful builds are deleted.
>
> You guessed right; I need the logs from the failed builds.
>
> It looks like your openf95 compiler is generating borked executables:
>
> -
> configure:31019: checking KIND value of Fortran C_SIGNED_CHAR
> configure:31046: openf95 -o conftestconftest.f90  >&5
> configure:31046: $? = 0
> configure:31046: ./conftest
> ./configure: line 4343:  1234 Illegal instruction (core dumped)
> ./conftest$ac_exeext
> configure:31046: $? = 132
> configure: program exited with status 132
> configure: failed program was:
> |   program main
> |
> | use, intrinsic :: ISO_C_BINDING
> | open(unit = 7, file = "conftest.out")
> | write(7, *) C_SIGNED_CHAR
> | close(7)
> |
> |   end
> -
>
> There's no reason the above Fortran program should fail with "illegal
> instruction".
>
> > I am not using  MXM. The results with the option you suggested are the
> same as before:
>
> We're investigating the latency issue.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] 1.7rc8 is posted

2013-02-28 Thread Pavel Mezentsev
Do you mean the logs from failed attempts? They are enclosed. If you need
the successful logs I'll need to make them again since the files from
successful builds are deleted.

I am not using  MXM. The results with the option you suggested are the same
as before:
#---
# Benchmarking PingPong
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 1000 1.49 0.00
1 1000 1.58 0.61
2 1000 1.12 1.71
4 1000 1.10 3.48
8 1000 1.11 6.90
   16 1000 1.1113.69
   32 1000 1.1227.21
   64 1000 1.1652.52
  128 1000 1.7270.83
  256 1000 1.84   132.72
  512 1000 1.99   245.74
 1024 1000 2.25   433.92
 2048 1000 2.87   680.54
 4096 1000 3.52  1109.13
 8192 1000 4.68  1670.60
16384 1000 9.66  1617.91
32768 100014.30  2185.24
65536  64023.45  2665.33
   131072  32035.99  3473.15
   262144  16058.05  4306.65
   524288   80   101.94  4904.69
  1048576   40   188.65  5300.86
  2097152   20   526.05  3801.94
  4194304   10  1096.09  3649.32

#---
# Benchmarking PingPing
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 1000 1.10 0.00
1 1000 1.24 0.77
2 1000 1.23 1.55
4 1000 1.23 3.10
8 1000 1.25 6.09
   16 1000 1.1413.41
   32 1000 1.1127.40
   64 1000 1.1652.75
  128 1000 1.7171.34
  256 1000 1.84   132.33
  512 1000 1.98   246.63
 1024 1000 2.27   429.26
 2048 1000 2.91   672.30
 4096 1000 3.52  1109.43
 8192 1000 4.80  1627.25
16384 1000 9.98  1565.64
32768 100014.70  2125.14
65536  64024.18  2584.97
   131072  32037.33  3348.95
   262144  16060.59  4125.82
   524288   80   105.83  4724.78
  1048576   40   197.82  5055.05
  2097152   20   791.35  2527.34
  4194304   10  1820.30  2197.44

Regards, Pavel Mezentsev.


2013/2/28 Jeff Squyres (jsquyres) <jsquy...@cisco.com>

> On Feb 27, 2013, at 7:36 PM, Pavel Mezentsev <pavel.mezent...@gmail.com>
> wrote:
>
> > I've tried the new rc. Here is what I got:
>
> Thanks for testing.
>
> > 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've
> failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems
> are in the fortran part. In each case I've used the following configuration
> line:
> > CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix
> --with-knem=$knem_path
> > Open64 failed during configuration with the following:
> > *** Fortran compiler
> > checking whether we are using the GNU Fortran compiler... yes
> > checking whether openf95 accepts -g... yes
> > configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment
> variables; only the FC and FCFLAGS environment variables are used.
> > checking whether ln -s works... yes
> > checking if Fortran compiler works... yes
> > checking for extra arguments to build a shared library... none needed
> > checking for Fortran flag to compile .f files... none
> > checking for Fortran flag to compile .f90 files... none
> > checking to see if Fortran compilers need additional linker flags... none
> > checking  external symbol convention... double underscore
> > checking if C and Fortran are link compatible... yes
> > checking to see if Fortran compiler likes the C++ exception flags...
> skipped (no C++ exceptions flags)
> > checking to see if mpifort compiler needs additional linker flags... none
> > checking if Fortran compiler supports CHARAC

Re: [OMPI devel] 1.7rc8 is posted

2013-02-27 Thread Pavel Mezentsev
4.30  2795.45  2672.58
  4194304   10  5185.79  5451.20  5298.98

This time I only ran the test on 160 processes but before I've done more
testing with 1.6 on different number of processes (from 16 to 320) and
those tuned parameters helped almost each time. I don't know what are
default parameters tuned for but perhaps it may be a good idea to change
the defaults for the kind of system I use.



I can perform some additional tests if necessary or give more information
on the problems that I've came across.

Regards, Pavel Mezentsev.


2013/2/27 Jeff Squyres (jsquyres) <jsquy...@cisco.com>

> The goal is to release 1.7 (final) by the end of this week.  New rc posted
> with fairly small changes:
>
> http://www.open-mpi.org/software/ompi/v1.7/
>
> - Fix wrong header file / compilation error in bcol
> - Support MXM STREAM for isend and irecv
> - Make sure "mpirun " fails with $status!=0
> - Bunches of cygwin minor fixes
> - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08
> bindings
> - Fix --disable-mpi-io with the F08 bindings
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] v1.7.0rc7

2013-02-25 Thread Pavel Mezentsev
I've tried to build it but got different errors with different compilers.

With Intel (2011.5.220) and pgi (13.2) I get the following error:
CC   bcol_iboffload_module.lo
bcol_iboffload_module.c(37): catastrophic error: cannot open source file
"ompi/mca/common/netpatterns/common_netpatterns.h"
  #include "ompi/mca/common/netpatterns/common_netpatterns.h"

I failed to find that file anywhere among the sources.

With pathscale (4.0.12.1) I get the following:
  PPFC mpi-f08-interfaces-callbacks.lo

module mpi_f08_interfaces_callbacks
   ^
pathf95-855 pathf95: ERROR MPI_F08_INTERFACES_CALLBACKS, File =
mpi-f08-interfaces-callbacks.F90, Line = 9, Column = 8
  The compiler has detected errors in module
"MPI_F08_INTERFACES_CALLBACKS".  No module information file will be created
for this module.


 attribute_val_in,attribute_val_out,flag,ierror) &
  ^

pathf95-1691 pathf95: ERROR MPI_COMM_COPY_ATTR_FUNCTION, File =
mpi-f08-interfaces-callbacks.F90, Line = 66, Column = 75
  For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C)


attribute_val_in,attribute_val_out,flag,ierror) &
 ^

pathf95-1691 pathf95: ERROR MPI_WIN_COPY_ATTR_FUNCTION, File =
mpi-f08-interfaces-callbacks.F90, Line = 91, Column = 74
  For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C)


 attribute_val_in,attribute_val_out,flag,ierror) &
  ^

pathf95-1691 pathf95: ERROR MPI_TYPE_COPY_ATTR_FUNCTION, File =
mpi-f08-interfaces-callbacks.F90, Line = 116, Column = 75
  For "FLAG", LOGICAL(KIND=4) not allowed with BIND(C)

SUBROUTINE MPI_Grequest_cancel_function(extra_state,complete,ierror) &
^
pathf95-1691 pathf95: ERROR MPI_GREQUEST_CANCEL_FUNCTION, File =
mpi-f08-interfaces-callbacks.F90, Line = 195, Column = 53
  For "COMPLETE", LOGICAL(KIND=4) not allowed with BIND(C)

pathf95: PathScale(TM) Fortran Version 4.0.12.1 (f14) Tue Feb 26, 2013
 06:33:40
pathf95: 429 source lines
pathf95: 5 Error(s), 0 Warning(s), 0 Other message(s), 0 ANSI(s)
pathf95: "explain pathf95-message number" gives more information about each
message
make[2]: *** [mpi-f08-interfaces-callbacks.lo] Error 1
make[2]: Leaving directory
`/tmp/mpi_install_tmp21558/openmpi-1.7rc7/ompi/mpi/fortran/base'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/mpi_install_tmp21558/openmpi-1.7rc7/ompi'
make: *** [all-recursive] Error 1

I am not a fortran guy and don't really know what's the problem here.

I tried configuring all cases only with setting the compilers in the
environment variables and setting --prefix. I managed to build 1.6.3 using
all 3 mentioned compilers with the same configuration lines without any
errors.

Not sure about the problem with pathscale but the first problem seems to be
a real error. Or did I miss something?

Regards, Pavel Mezentsev.


2013/2/26 Ralph Castain <r...@open-mpi.org>

>
> On Feb 25, 2013, at 1:40 PM, marco atzeri <marco.atz...@gmail.com> wrote:
>
> > On 2/23/2013 11:45 PM, Ralph Castain wrote:
> >> This release candidate is the last one we expect to have before
> release, so please test it. Can be downloaded from the usual place:
> >>
> >> http://www.open-mpi.org/software/ompi/v1.7/
> >>
> >> Latest changes include:
> >>
> >> * update of the alps/lustre configure code
> >> * fixed solaris hwloc code
> >> * various mxm updates
> >> * removed java bindings (delayed until later release)
> >> * improved the --report-bindings output
> >> * a variety of minor cleanups
> >>
> >
> > any reason to not include the cygwin patches added to 1.6.4 ?
>
> I don't believe they were ever CMR'd for 1.7.0, so they were never moved
>
> >
> > Marco
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] algorithm selection in open mpi

2012-04-03 Thread Pavel Mezentsev
Is there a way to specify collective depending on the size of the message
and number of processes?

Regards,
Pavel Mezentsev

2012/4/3 George Bosilca <bosi...@eecs.utk.edu>

> Roswan,
>
> There a re simpler solutions to achieve this. We have a built-in mechanism
> to select a specific collective implementation. Here is what you have to
> add in your .openmpi/mca-params.conf (or as MCA argument on the command
> line):
>
> coll_tuned_use_dynamic_rules = 1
> coll_tuned_bcast_algorithm = 6
>
> The first one activate the dynamic selection of collective algorithms,
> while the second one force all broadcast to be of the type 6 (binomial
> tree). Btw, once you set the first one, do a quick "ompi_info --param coll
> tuned" to see the list of all possible options for the collective algorithm
> selection.
>
>   george.
>
> On Apr 2, 2012, at 23:10 , roswan ismail wrote:
>
> Hi all..
>
> I am Roswan Ismail from Malaysia. I am focusing on MPI communication
> performance on quad-core cluster at my university. I used Open MPI-1.4.3
> and measurements were done using scampi benchmark.
>
> As I know, open MPI used multiple algorithms to broadcast data (MPI_BCAST)
> such as binomial, pipeline, binary tree, basic linear and split binary
> tree. All these algorithms will be used based on message size and
> communicator size. For example, binomial is used when message size to be
> broadcasted is small while pipeline used for broadcasting a large message.
>
> What I want to do now is, to use fixed algorithm i.e binomial for all
> message size. I want to see and compare the results with the default
> results. So, I was modified coll_tuned_decision_fixed.c which is located in
> open mpi-1.4.3/ompi/mca/coll/tuned by returning binomial algorithm for all
> condition. Then I recompile the files but the problem is, the results
> obtained is same as default. It seems I do not do any changes to the codes.
>
> So could you guys tell me the right way to do that.
>
> Many thanks
>
> **
> *Roswan Binti Ismail,
> FTMK,
> Univ. Pend. Sultan Idris,
> Tg Malim, Perak.
> Pej: 05-4505173
> H/P: 0123588047
> iewa...@gmail.com <iewanis1402@hotmail>
> ros...@ftmk.upsi.edu.my
> ***
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] barrier problem

2012-03-28 Thread Pavel Mezentsev
I took the best result from each version, that's why different algotithm
numbers were chosen.

I've studied the matter a bit further and here's what I got:
with openmpi 1.5.4 these are the average times:
/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile
hosts_all2all_4 -npernode 32 --mca btl openib,sm,self -mca
coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm $i -np 128
openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 128
barrier
0 - 71.78
3 - 69.39
6 - 69.05

If I pin the processes with the following script:
#!/bin/bash

s=$(($OMPI_COMM_WORLD_NODE_RANK))

numactl --physcpubind=$((s)) --localalloc openmpi-1.5.4/intel12/IMB-MPI1
-off_cache 16,64 -msglog 1:16 -npmin 128 barrier
then the results improve:
0 - 51.96
3 - 52.39
6 - 28.64

On openmpi-1.5.5rc3 without any binding the results are awful (14964.15 is
the best)
If I use the '--bind-to-core' flag then the results are almost the same as
in 1.5.4 with binding script:
0 - 52.85
3 - 52.69
6 - 23.34

So almost everything seems to work fine now. The only problem left is that
algorithm number 5 hangs

2012/3/28 Jeffrey Squyres <jsquy...@cisco.com>

> FWIW:
>
> 1. There were definitely some issues with binding to cores and process
> layouts on Opterons that should be fixed in the 1.5.5 that was finally
> released today.
>
> 2. It is strange that the performance of barrier is so much different
> between 1.5.4 and 1.5.5.  Is there a reason you were choosing different
> algorithm numbers between the two?  (one of your command lines had
> "coll_tuned_barrier_algorithm 1", the other had
> "coll_tuned_barrier_algorithm 3").
>
>
> On Mar 23, 2012, at 10:11 AM, Shamis, Pavel wrote:
>
> > Pavel,
> >
> > Mvapich implements multicore optimized collectives, which perform
> substantially better than default algorithms.
> > FYI,  ORNL team works on new high performance collectives framework for
> OMPI. The framework provides significant boost in collectives performance.
> >
> > Regards,
> >
> > Pavel (Pasha) Shamis
> > ---
> > Application Performance Tools Group
> > Computer Science and Math Division
> > Oak Ridge National Laboratory
> >
> >
> >
> >
> >
> >
> > On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:
> >
> > I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's
> why I didn't use --bind-to-core. I checked and the usage of --bind-to-core
> improved the result comparing to 1.5.4:
> > #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> > 100084.9685.0885.02
> >
> > So I guess with 1.5.5 the processes move from core to core within node
> even though I use all cores, right? Then why 1.5.4 behaves differently?
> >
> > I need --bind-to-core in some cases and that's why I need 1.5.5rc3
> instead of more stable 1.5.4. I know that I can use numactl explicitly but
> --bind-to-core is more convinient :)
> >
> > 2012/3/23 Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>>
> > I don't see where you told OMPI to --bind-to-core. We don't
> automatically bind, so you have to explicitly tell us to do so.
> >
> > On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> >
> >> Hello
> >>
> >> I'm doing some testing with IMB and dicovered a strange thing:
> >>
> >> Since I have a system with new AMD opteron 6276 processors I'm using
> 1.5.5rc3 since it supports binding to cores.
> >>
> >> But when I run the barrier test form intel mpi benchmarks, the best I
> get is:
> >> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> >>  598 15159.56 15211.05 15184.70
> >> (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1
>  -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256
> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >>
> >> And with openmpi 1.5.4 the result is much better:
> >> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> >> 1000   113.23   113.33   113.28
> >>
> >> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256
> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >>
> >> and still I couldn't come close to the result I got with mvapich:
> >> #repetitions  t_min[usec]  t_max[usec]  t_avg[

Re: [OMPI devel] barrier problem

2012-03-23 Thread Pavel Mezentsev
I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why
I didn't use --bind-to-core. I checked and the usage of --bind-to-core
improved the result comparing to 1.5.4:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100084.9685.0885.02

So I guess with 1.5.5 the processes move from core to core within node even
though I use all cores, right? Then why 1.5.4 behaves differently?

I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead
of more stable 1.5.4. I know that I can use numactl explicitly but
--bind-to-core is more convinient :)

2012/3/23 Ralph Castain <r...@open-mpi.org>

> I don't see where you told OMPI to --bind-to-core. We don't automatically
> bind, so you have to explicitly tell us to do so.
>
> On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
>
> > Hello
> >
> > I'm doing some testing with IMB and dicovered a strange thing:
> >
> > Since I have a system with new AMD opteron 6276 processors I'm using
> 1.5.5rc3 since it supports binding to cores.
> >
> > But when I run the barrier test form intel mpi benchmarks, the best I
> get is:
> > #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> >   598 15159.56 15211.05 15184.70
> >  (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1
>  -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256
> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >
> > And with openmpi 1.5.4 the result is much better:
> > #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> >  1000   113.23   113.33   113.28
> >
> > (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256
> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >
> > and still I couldn't come close to the result I got with mvapich:
> > #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> >  100017.5117.5317.53
> >
> > (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1
> -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2
> -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
> >
> > I dunno if this is a bug or me doing something not the way I should. So
> is there a way to improve my results?
> >
> > Best regards,
> > Pavel Mezentsev
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


[OMPI devel] barrier problem

2012-03-23 Thread Pavel Mezentsev
Hello

I'm doing some testing with IMB and dicovered a strange thing:

Since I have a system with new AMD opteron 6276 processors I'm using
1.5.5rc3 since it supports binding to cores.

But when I run the barrier test form intel mpi benchmarks, the best I get
is:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
  598 15159.56 15211.05 15184.70
 (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile
hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256
openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
barrier)

And with openmpi 1.5.4 the result is much better:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 1000   113.23   113.33   113.28

(/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile
hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256
openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
barrier)

and still I couldn't come close to the result I got with mvapich:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100017.5117.5317.53

(/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1
-hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2
-off_cache 16,64 -msglog 1:16 -npmin 256 barrier)

I dunno if this is a bug or me doing something not the way I should. So is
there a way to improve my results?

Best regards,
Pavel Mezentsev