I have successfully build openmpi-1.1a1r9260 (from the subversion trunk)
in 64-bit mode on Solaris Opteron.
This r9260 tarball incorporates the last patches for Solaris from Brian
Barrett.
In order to accelerate the build I disabled the f90 bindings. My build
script is as follows:
#! /bin/tcsh -v
setenv CC "cc"
setenv CXX "CC"
setenv F77 "f95"
setenv FC "f95"
setenv CFLAGS "-g -O -xtarget=opteron -xarch=amd64"
setenv CXXFLAGS "-O -xtarget=opteron -xarch=amd64"
setenv FFLAGS "-O -xtarget=opteron -xarch=amd64"
setenv FCFLAGS "-O1 -xtarget=opteron -xarch=amd64"
setenv LD "ld -64"
setenv LDFLAGS "-xtarget=opteron -xarch=amd64"
setenv CXXLDFLAGS "-xtarget=opteron -xarch=amd64"
./configure --prefix=/users/valiron/lib/openmpi-1.1a1r9260 --disable-mpi-f90
gmake
gmake install
- The build is fine
- The standard output in mpirun is now fixed and behaves as expected
- Processor binding is functional (mpirun --mca mpi_paffinity_alone 1)
and performances are improved with this option (tested on SMP
quadriprocessors v40z)
- The latency is very low. Rotating buffers (each task pass a buffer to
its neighbour on a ring) produces the following performance on a
quadripro v40z:
valiron@n33 ~/BENCHES > mpirun --mca mpi_paffinity_alone 1 -np 4 rotate
NPROCS 4
buf_size, sent/node, iter_time (s), rate/node (MB/s)
8 400 0.000003 4.958
16 400 0.000003 8.758
32 400 0.000003 23.252
64 400 0.000003 35.692
128 400 0.000003 86.414
256 400 0.000003 162.218
512 400 0.000003 281.609
1024 40 0.000013 144.927
2048 40 0.000019 210.051
4096 40 0.000014 556.097
8192 40 0.000021 748.555
16384 40 0.000033 943.303
32768 40 0.000061 1024.600
65536 40 0.000112 1116.338
131072 40 0.000215 1161.857
262144 40 0.000427 1172.068
524288 40 0.000964 1037.103
1048576 8 0.002225 898.679
2097152 8 0.005317 752.287
4194304 8 0.008794 909.763
8388608 8 0.017402 919.429
16777216 8 0.034699 922.217
In comparison, I obtain the following data with lam/mpi (lam-7.1.2b28)
on the same platform:
valiron@n15 ~/BENCHES > mpirun -ssi rpi usysv -np 4 rotate
NPROCS 4
buf_size, sent/node, iter_time (s), rate/node (MB/s)
8 400 0.000007 2.160
16 400 0.000007 4.579
32 400 0.000007 9.175
64 400 0.000007 18.350
128 400 0.000007 34.925
256 400 0.000007 69.731
512 400 0.000007 132.998
1024 40 0.000008 230.598
2048 40 0.000010 399.610
4096 40 0.000013 592.014
8192 40 0.000018 845.899
16384 40 0.000035 902.544
32768 40 0.000093 668.991
65536 40 0.000169 738.226
131072 40 0.000364 687.140
262144 40 0.000806 620.308
524288 40 0.001631 613.281
1048576 8 0.003703 540.086
2097152 8 0.004849 824.828
4194304 8 0.009545 838.101
8388608 8 0.019937 802.523
16777216 8 0.037989 842.349
Open-mpi offers a much better latency than lam/mpi (3 us instead of 7
us) and also features a higher throughput. This is very promising !
- Finally I could run successfully my production ab-initio quantum
chemistry code DIRCCR12 using open-mpi.
Congratulations to the open-mpi folks!
Pierre V.
PS. The open-mpi performances over gigabit ethernet don't seem as good
with respect to lam/mpi. I'll make more testing after browsing the
ethernet-related messages on the list. I'll also check if parallelizing
over two ethernet NICS helps.
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/
_/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/