I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron.

Each quadripro node possess two ethernet interfaces bge0 and bge1.
Interfaces bge0 are dedicated to parallel jobs and correspond to node names pxx,
they use a dedicated gigabit switch.
Interfaces bge1 provide nfs sharing etc and correspond to node names nxx over another gigabit switch.

1) I allocated 4 quadripro nodes.
As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4 tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES --bynode distributes a task on each node.

2) According to the users list, mpirun --mca pml teg should revert to 2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get the message
No available pml components were found!
Have you removed the 2nd generation TCP transport ? Do you consider the new ob1 is competitive now ?

3) According to the users list, tuned collective primitives are available. Apparently they are now compiled by default, but the don't seem functional at all:

mpirun --mca coll tuned
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:0
*** End of error message ***

4) According to the FAQ and to the users list, openmpi attempts to discover and use all interfaces. I attempted to force using bge0 only with no success.

mpirun --mca btl_tcp_if_exclude bge1
[n33:04784] *** An error occurred in MPI_Barrier
[n33:04784] *** on communicator MPI_COMM_WORLD
[n33:04784] *** MPI_ERR_INTERN: internal error
[n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)

In the FAQ it is stated that a new syntax should be available soon. I tried if it is already implemented in openmpi-1.1a1r9260

mpirun --mca btl_tcp_if ^bge0,bge1
mpirun --mca btl_tcp_if ^bge1
works with identical performances.

However I doubt this option is functional, because if I disable all ethernet interfaces,
mpirun --mca btl_tcp_if ^bge0,bge1
the job still works!

I would be happy to have more control on the interfaces being used.

What is expected to work on other platforms ?
What could be specific issues to the Solaris Opteron ?

Have a nice openmpi day!

--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/

       _/_/_/_/    _/       _/       Dr. Pierre VALIRON
      _/     _/   _/      _/   Laboratoire d'Astrophysique
     _/     _/   _/     _/    Observatoire de Grenoble / UJF
    _/_/_/_/    _/    _/    BP 53  F-38041 Grenoble Cedex 9 (France)
   _/          _/   _/    http://www-laog.obs.ujf-grenoble.fr/~valiron/
  _/          _/  _/     Mail: pierre.vali...@obs.ujf-grenoble.fr
 _/          _/ _/      Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/ _/_/

Reply via email to