I have recently installed openmpi 1.3r1212a over tcp and gigabit
on a Solaris 10 x86/64 system.

The compilation of some test codes
monte (a monte carlo estimate of pi),
connectivity which test connectivity between processes and nodes
prime, which calculates prime numbers  (these testcode are examples
which are bundled with Sun HPC).

compile fine using the openmpi version of mpicc, mpif95 and mpic++

And sometimes the jobs work fine, but most of the time the jobs freeze
leaving zombies behind.

my run time command is

mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl tcp,self -np 14 \
monte

and I get as output
oberon(209) > mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl
tcp,self -np 14 monte
Monte-Carlo estimate of pi by   14 processes is 3.141503.

with the cursor hanging.

The process table shows

oberon# ps -eaf | grep dph0elh
 dph0elh  9583  7445   7 17:45:01 pts/26      9:22 mpirun --hostfile my-hosts
-mca pls_rsh_agent rsh --mca btl tcp,self -np 14 mon
 dph0elh  9595  9588   0        - ?           0:02 <defunct>
 dph0elh  9588     1   7 17:45:01 ??          9:03 orted --bootproxy 1 --name
0.0.1 --num_procs 5 --vpid_start 0 --nodename oberon
 dph0elh  7445  6924   0 17:01:38 pts/26      0:00 -tcsh
    root  9656  4151   0 18:01:31 pts/36      0:00 grep dph0elh
 dph0elh  9593  9588   0        - ?           0:02 <defunct>


one of the nodes offers 8 cpus the other nodes in the hostfile offer 2.
There are a total of 14 cpus available. and as you can see from the command line
I use --mca btl tcp,self

There are no other interconnects.

I could not find any entry in the FAQs, except for the advice on using
--mca btl tcp,self.




------------------------------------------
Dr E L  Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

DURHAM, DH1 3LE
United Kingdom

e-mail: lydia.h...@durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________

Reply via email to