Thanks. $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # /opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np $NSLOTS --host node0001,node0002 hostname
$ cat HPL_8cpu_GB.o46 ====================== ALLOCATED NODES ====================== Data for node: Name: node0001 Num slots: 4 Max slots: 0 Data for node: Name: node0002.v5cluster.com Num slots: 4 Max slots: 0 ================================================================= ======================== JOB MAP ======================== Data for node: Name: node0001 Num procs: 8 Process OMPI jobid: [10982,1] Process rank: 0 Process OMPI jobid: [10982,1] Process rank: 1 Process OMPI jobid: [10982,1] Process rank: 2 Process OMPI jobid: [10982,1] Process rank: 3 Process OMPI jobid: [10982,1] Process rank: 4 Process OMPI jobid: [10982,1] Process rank: 5 Process OMPI jobid: [10982,1] Process rank: 6 Process OMPI jobid: [10982,1] Process rank: 7 ============================================================= node0001 node0001 node0001 node0001 node0001 node0001 node0001 node0001 I'm not sure why node0001 miss the domain name, is this related? However the result is correct when I run "qconf -sel" $ qconf -sel node0001.v5cluster.com node0002.v5cluster.com 2009/4/1 Ralph Castain <r...@lanl.gov> > Rolf has correctly reminded me that display-allocation occurs prior to host > filtering, so you will see all of the allocated nodes. You'll see the impact > of the host specifications in display-map, > > Sorry for the confusion - thanks to Rolf for pointing it out. > Ralph > > > On Apr 1, 2009, at 7:40 AM, Ralph Castain wrote: > > As an FYI: you can debug allocation issues more easily by: >> >> mpirun --display-allocation --do-not-launch -n 1 foo >> >> This will read the allocation, do whatever host filtering you specify with >> -host and -hostfile options, report out the result, and then terminate >> without trying to launch anything. I found it most useful for debugging >> these situations. >> >> If you want to know where the procs would have gone, then you can do: >> >> mpirun --display-allocation --display-map --do-not-launch -n 8 foo >> >> In this case, the #procs you specify needs to be the number you actually >> wanted so that the mapper will properly run. However, the executable can be >> bogus and nothing will actually launch. It's the closest you can come to a >> dry run of a job. >> >> HTH >> Ralph >> >> >> On Apr 1, 2009, at 7:10 AM, Rolf Vandevaart wrote: >> >> It turns out that the use of --host and --hostfile act as a filter of >>> which nodes to run on when you are running under SGE. So, listing them >>> several times does not affect where the processes land. However, this still >>> does not explain why you are seeing what you are seeing. One thing you can >>> try is to add this to the mpirun command. >>> >>> -mca ras_gridengine_verbose 100 >>> >>> This will provide some additional information as to what Open MPI is >>> seeing as nodes and slots from SGE. (Is there any chance that node0002 >>> actually has 8 slots?) >>> >>> I just retried on my cluster of 2 CPU sparc solaris nodes. When I run >>> with np=2, the two MPI processes will all land on a single node, because >>> that node has two slots. When I go up to np=4, then they move on to the >>> other node. The --host acts as a filter to where they should run. >>> >>> In terms of the using "IB bonding", I do not know what that means >>> exactly. Open MPI does stripe over multiple IB interfaces, so I think the >>> answer is yes. >>> >>> Rolf >>> >>> PS: Here is what my np=4 job script looked like. (I just changed np=2 >>> for the other run) >>> >>> burl-ct-280r-0 148 =>more run.sh >>> #! /bin/bash >>> #$ -S /bin/bash >>> #$ -V >>> #$ -cwd >>> #$ -N Job1 >>> #$ -pe orte 200 >>> #$ -j y >>> #$ -l h_rt=00:20:00 # Run time (hh:mm:ss) - 10 min >>> >>> echo $NSLOTS >>> /opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 -v -np >>> 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp hostname >>> >>> Here is the output (somewhat truncated) >>> burl-ct-280r-0 150 =>more Job1.o199 >>> 200 >>> [burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199 >>> [burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE: >>> /ws/ompi-tools/orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/pe_hostfile >>> [..snip..] >>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE shows >>> slots=2 >>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE shows >>> slots=2 >>> [..snip..] >>> burl-ct-280r-1 >>> burl-ct-280r-1 >>> burl-ct-280r-0 >>> burl-ct-280r-0 >>> burl-ct-280r-0 151 => >>> >>> >>> On 03/31/09 22:39, PN wrote: >>> >>>> Dear Rolf, >>>> Thanks for your reply. >>>> I've created another PE and changed the submission script, explicitly >>>> specify the hostname with "--host". >>>> However the result is the same. >>>> # qconf -sp orte >>>> pe_name orte >>>> slots 8 >>>> user_lists NONE >>>> xuser_lists NONE >>>> start_proc_args /bin/true >>>> stop_proc_args /bin/true >>>> allocation_rule $fill_up >>>> control_slaves TRUE >>>> job_is_first_task FALSE >>>> urgency_slots min >>>> accounting_summary TRUE >>>> $ cat hpl-8cpu-test.sge >>>> #!/bin/bash >>>> # >>>> #$ -N HPL_8cpu_GB >>>> #$ -pe orte 8 >>>> #$ -cwd >>>> #$ -j y >>>> #$ -S /bin/bash >>>> #$ -V >>>> # >>>> cd /home/admin/hpl-2.0 >>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host >>>> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 >>>> ./bin/goto-openmpi-gcc/xhpl >>>> # pdsh -a ps ax --width=200|grep hpl >>>> node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v -np >>>> 8 --host >>>> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 >>>> ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >>>> node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >>>> Any hint to debug this situation? >>>> Also, if I have 2 IB ports in each node, which IB bonding was done, will >>>> Open MPI automatically benefit from the double bandwidth? >>>> Thanks a lot. >>>> Best Regards, >>>> PN >>>> 2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com <mailto: >>>> rolf.vandeva...@sun.com>> >>>> On 03/31/09 11:43, PN wrote: >>>> Dear all, >>>> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 >>>> I have 2 compute nodes for testing, each node has a single quad >>>> core CPU. >>>> Here is my submission script and PE config: >>>> $ cat hpl-8cpu.sge >>>> #!/bin/bash >>>> # >>>> #$ -N HPL_8cpu_IB >>>> #$ -pe mpi-fu 8 >>>> #$ -cwd >>>> #$ -j y >>>> #$ -S /bin/bash >>>> #$ -V >>>> # >>>> cd /home/admin/hpl-2.0 >>>> # For IB >>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile >>>> $TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl >>>> I've tested the mpirun command can be run correctly in command >>>> line. >>>> $ qconf -sp mpi-fu >>>> pe_name mpi-fu >>>> slots 8 >>>> user_lists NONE >>>> xuser_lists NONE >>>> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile >>>> stop_proc_args /opt/sge/mpi/stopmpi.sh >>>> allocation_rule $fill_up >>>> control_slaves TRUE >>>> job_is_first_task FALSE >>>> urgency_slots min >>>> accounting_summary TRUE >>>> I've checked the $TMPDIR/machines after submit, it was correct. >>>> node0002 >>>> node0002 >>>> node0002 >>>> node0002 >>>> node0001 >>>> node0001 >>>> node0001 >>>> node0001 >>>> However, I found that if I explicitly specify the "-machinefile >>>> $TMPDIR/machines", all 8 mpi processes were spawned within a >>>> single node, i.e. node0002. >>>> However, if I omit "-machinefile $TMPDIR/machines" in the line >>>> mpirun, i.e. >>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS >>>> ./bin/goto-openmpi-gcc/xhpl >>>> The mpi processes can start correctly, 4 processes in node0001 >>>> and 4 processes in node0002. >>>> Is this normal behaviour of Open MPI? >>>> I just tried it both ways and I got the same result both times. The >>>> processes are split between the nodes. Perhaps to be extra sure, >>>> you can just run hostname? And for what it is worth, as you have >>>> seen, you do not need to specify a machines file. Open MPI will use >>>> the ones that were allocated by SGE. You can also change your >>>> parallel queue to not run any scripts. Like this: >>>> start_proc_args /bin/true >>>> stop_proc_args /bin/true >>>> Also, I wondered if I have IB interface, for example, the >>>> hostname of IB become node0001-clust and node0002-clust, will >>>> Open MPI automatically use the IB interface? >>>> Yes, it should use the IB interface. >>>> How about if I have 2 IB ports in each node, which IB bonding >>>> was done, will Open MPI automatically benefit from the double >>>> bandwidth? >>>> Thanks a lot. >>>> Best Regards, >>>> PN >>>> >>>> ------------------------------------------------------------------------ >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> -- ========================= >>>> rolf.vandeva...@sun.com <mailto:rolf.vandeva...@sun.com> >>>> 781-442-3043 >>>> ========================= >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> ------------------------------------------------------------------------ >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >>> -- >>> >>> ========================= >>> rolf.vandeva...@sun.com >>> 781-442-3043 >>> ========================= >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >