Dear Rolf,
Thanks for your reply.
I've created another PE and changed the submission script,
explicitly specify the hostname with "--host".
However the result is the same.
# qconf -sp orte
pe_name orte
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host
node0001
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./
bin/goto-openmpi-gcc/xhpl
# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v
-np 8 --host
node0001
,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./
bin/goto-openmpi-gcc/xhpl
node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
Any hint to debug this situation?
Also, if I have 2 IB ports in each node, which IB bonding was done,
will Open MPI automatically benefit from the double bandwidth?
Thanks a lot.
Best Regards,
PN
2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com <mailto:rolf.vandeva...@sun.com
>>
On 03/31/09 11:43, PN wrote:
Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single
quad
core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
# For IB
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile
$TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl
I've tested the mpirun command can be run correctly in
command line.
$ qconf -sp mpi-fu
pe_name mpi-fu
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh
$pe_hostfile
stop_proc_args /opt/sge/mpi/stopmpi.sh
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
I've checked the $TMPDIR/machines after submit, it was
correct.
node0002
node0002
node0002
node0002
node0001
node0001
node0001
node0001
However, I found that if I explicitly specify the "-
machinefile
$TMPDIR/machines", all 8 mpi processes were spawned within a
single node, i.e. node0002.
However, if I omit "-machinefile $TMPDIR/machines" in the line
mpirun, i.e.
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS
./bin/goto-openmpi-gcc/xhpl
The mpi processes can start correctly, 4 processes in node0001
and 4 processes in node0002.
Is this normal behaviour of Open MPI?
I just tried it both ways and I got the same result both times.
The
processes are split between the nodes. Perhaps to be extra sure,
you can just run hostname? And for what it is worth, as you have
seen, you do not need to specify a machines file. Open MPI will
use
the ones that were allocated by SGE. You can also change your
parallel queue to not run any scripts. Like this:
start_proc_args /bin/true
stop_proc_args /bin/true
Also, I wondered if I have IB interface, for example, the
hostname of IB become node0001-clust and node0002-clust, will
Open MPI automatically use the IB interface?
Yes, it should use the IB interface.
How about if I have 2 IB ports in each node, which IB bonding
was done, will Open MPI automatically benefit from the double
bandwidth?
Thanks a lot.
Best Regards,
PN
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
-- =========================
rolf.vandeva...@sun.com <mailto:rolf.vandeva...@sun.com>
781-442-3043
=========================
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users