On Aug 16, 2007, at 5:34 AM, jody wrote:
Just a quick update about my ssh/LD_LIBRARY_PATH problem.
Apparently on my System the sshd was configured not to permit
user defined environment variables (security reasons?).
To fix that i had to change the file
/etc/ssh/sshd_config
By changing the entry
#PermitUserEnvironment no
to
PermitUserEnvironment yes
and adding these lines to the file ~/.ssh/environment
PATH=/opt/openmpi/bin:/usr/local/bin:/bin:/usr/bin
LD_LIBRARY_PATH=/opt/openmpi/lib
Maybe it is an overkill, but at least ssh now makes the two
variables available,
and simple openmpi test applications run.
That is one option; another option which does not require root-level
changes is simply to modify your shell startup files appropriately.
The FAQ describes which files to modify for each shell.
I have done this fixes on all my 7 gentoo machines (nano_00 -
nano_06),
and simple openmpi test applications run with any number of processes.
But the fedora machine (plankton) still has problems in some cases.
In the test application i use, process #0 broadcasts a number to all
other processes.
This works in the following cases always calling from nano_02:
mpirun -np 3 --host nano_00 ./MPITest
mpirun -np 3 --host plankton ./MPITest
mpirun -np 3 --host plankton,nano_00 ./MPITest
But it doesn't work like this:
mpirun -np 4 --host nano_00,plankton ./MPITest
as soon as the MPI_Broadcast statement is rached,
i get an errorr message:
[nano_00][0,1,0][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
You are now technically running in a heterogeneous scenario. You
will likely need to have OMPI and your MPITest executable compiled
separately for each OS (gentoo and fedora). Differences in libc
(etc.) can make a single executable not work properly across both,
and sometimes the problems can be quite subtle / difficult to
diagnose. The easier solution is not to try having a single
executable, but rather to have installations on for each OS.
Once you have it setup, you can either rely on the PATH to find the
MPITest that is appropriate for each OS (if you set that up
properly), or you can be explicit with something like the following
(assuming that you have previously created MPITest.gentoo for gentoo
and MPITest.fedora for fedora):
mpirun -np 1 -host gentoo_host MPITest.gentoo : \
-np 1 -host fedora_host MPITest.fedora
Note that we do not actively test such heterogeneous scenarios, but
it should/could/might work... (read: it worked at one time, but I'm
not sure if any of us have tested it in a long time)
--
Jeff Squyres
Cisco Systems