I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron.
Each quadripro node possess two ethernet interfaces bge0 and bge1.
Interfaces bge0 are dedicated to parallel jobs and correspond to node
names pxx,
they use a dedicated gigabit switch.
Interfaces bge1 provide nfs sharing etc and correspond to node names nxx
over another gigabit switch.
1) I allocated 4 quadripro nodes.
As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4
tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES
--bynode distributes a task on each node.
2) According to the users list, mpirun --mca pml teg should revert to
2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get
the message
No available pml components were found!
Have you removed the 2nd generation TCP transport ? Do you consider the
new ob1 is competitive now ?
3) According to the users list, tuned collective primitives are
available. Apparently they are now compiled by default, but the don't
seem functional at all:
mpirun --mca coll tuned
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:0
*** End of error message ***
4) According to the FAQ and to the users list, openmpi attempts to
discover and use all interfaces. I attempted to force using bge0 only
with no success.
mpirun --mca btl_tcp_if_exclude bge1
[n33:04784] *** An error occurred in MPI_Barrier
[n33:04784] *** on communicator MPI_COMM_WORLD
[n33:04784] *** MPI_ERR_INTERN: internal error
[n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)
In the FAQ it is stated that a new syntax should be available soon. I
tried if it is already implemented in openmpi-1.1a1r9260
mpirun --mca btl_tcp_if ^bge0,bge1
mpirun --mca btl_tcp_if ^bge1
works with identical performances.
However I doubt this option is functional, because if I disable all
ethernet interfaces,
mpirun --mca btl_tcp_if ^bge0,bge1
the job still works!
I would be happy to have more control on the interfaces being used.
What is expected to work on other platforms ?
What could be specific issues to the Solaris Opteron ?
Have a nice openmpi day!
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/
_/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/