Hi,
 
     We are experiencing a problem with the process allocation on our Open MPI 
cluster. We are using Scyld 4.1 (BPROC), the OFED 1.2 Topspin Infiniband 
drivers, Open MPI 1.2.3 + patch (to run processes on the head node). The 
hardware consists of a head node and N blades on private ethernet and 
infiniband networks.
 
The command run for these tests is a simple MPI program (called 'hn') which 
prints out the rank and the hostname. The hostname for the head node is 'head' 
and the compute nodes are '.0' ... '.9'.
 
We are using the following hostfiles for this example:
 
hostfile7
-1 max_slots=1
0 max_slots=3
1 max_slots=3
 
hostfile8
-1 max_slots=2
0 max_slots=3
1 max_slots=3
 
hostfile9
-1 max_slots=3
0 max_slots=3
1 max_slots=3
 
running the following commands:
 
orterun --hostfile hostfile7 -np 7 ./hn
orterun --hostfile hostfile8 -np 8 ./hn
orterun --byslot --hostfile hostfile7 -np 7 ./hn
orterun --byslot --hostfile hostfile8 -np 8 ./hn
 
causes orterun to crash. However,
 
orterun --hostfile hostfile9 -np 9 ./hn
ortetrun --byslot --hostfile hostfile9 -np 9 ./hn
 
works outputing the following:
 
0 head
1 head
2 head
3 .0
4 .0
5 .0
6 .0
7 .0
8 .0
 
However, running the following:
 
orterun --bynode --hostfile hostfile7 -np 7 ./hn 
 
works, outputing the following
 
0 head
1 .0
2 .1
3 .0
4 .1
5 .0
6 .1
 
Is the '--byslot' crash a known problem? Does it have something to do with 
BPROC? Thanks in advance for any assistance!
 
Sean
 

<<winmail.dat>>

Reply via email to