Thanks. I've tried your suggestion. $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # /opt/openmpi-gcc/bin/mpirun -mca ras_gridengine_verbose 100 -v -np $NSLOTS --host node0001,node0002 hostname
It allocated 2 nodes to run, however all the processes are spawned in node0001. $ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- al...@node0001.v5cluster.com BIPC 0/4/4 4.79 lx24-amd64 45 0.55500 HPL_8cpu_G admin r 04/02/2009 00:26:49 4 --------------------------------------------------------------------------------- al...@node0002.v5cluster.com BIPC 0/4/4 0.00 lx24-amd64 45 0.55500 HPL_8cpu_G admin r 04/02/2009 00:26:49 4 $ cat HPL_8cpu_GB.o45 [node0001:03194] ras:gridengine: JOB_ID: 45 [node0001:03194] ras:gridengine: node0001.v5cluster.com: PE_HOSTFILE shows slots=4 [node0001:03194] ras:gridengine: node0002.v5cluster.com: PE_HOSTFILE shows slots=4 node0001 node0001 node0001 node0001 node0001 node0001 node0001 node0001 $ qconf -sq all.q qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:01:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list blcr pe_list make mpi-rr mpi-fu orte rerun FALSE slots 4,[node0001=4],[node0002=4] tmpdir /tmp shell /bin/sh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY $ qconf -se node0001 hostname node0001.v5cluster.com load_scaling NONE complex_values slots=4 load_values arch=lx24-amd64,num_proc=4,mem_total=3949.597656M, \ swap_total=0.000000M,virtual_total=3949.597656M, \ load_avg=2.800000,load_short=0.220000, \ load_medium=2.800000,load_long=2.320000, \ mem_free=3818.746094M,swap_free=0.000000M, \ virtual_free=3818.746094M,mem_used=130.851562M, \ swap_used=0.000000M,virtual_used=130.851562M, \ cpu=0.000000,np_load_avg=0.700000, \ np_load_short=0.055000,np_load_medium=0.700000, \ np_load_long=0.580000 processors 4 user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONE $ qconf -se node0002 hostname node0002.v5cluster.com load_scaling NONE complex_values slots=4 load_values arch=lx24-amd64,num_proc=4,mem_total=3949.597656M, \ swap_total=0.000000M,virtual_total=3949.597656M, \ load_avg=0.000000,load_short=0.000000, \ load_medium=0.000000,load_long=0.000000, \ mem_free=3843.074219M,swap_free=0.000000M, \ virtual_free=3843.074219M,mem_used=106.523438M, \ swap_used=0.000000M,virtual_used=106.523438M, \ cpu=0.000000,np_load_avg=0.000000, \ np_load_short=0.000000,np_load_medium=0.000000, \ np_load_long=0.000000 processors 4 user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONE 2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com> > It turns out that the use of --host and --hostfile act as a filter of which > nodes to run on when you are running under SGE. So, listing them several > times does not affect where the processes land. However, this still does > not explain why you are seeing what you are seeing. One thing you can try > is to add this to the mpirun command. > > -mca ras_gridengine_verbose 100 > > This will provide some additional information as to what Open MPI is seeing > as nodes and slots from SGE. (Is there any chance that node0002 actually > has 8 slots?) > > I just retried on my cluster of 2 CPU sparc solaris nodes. When I run with > np=2, the two MPI processes will all land on a single node, because that > node has two slots. When I go up to np=4, then they move on to the other > node. The --host acts as a filter to where they should run. > > In terms of the using "IB bonding", I do not know what that means exactly. > Open MPI does stripe over multiple IB interfaces, so I think the answer is > yes. > > Rolf > > PS: Here is what my np=4 job script looked like. (I just changed np=2 for > the other run) > > burl-ct-280r-0 148 =>more run.sh > #! /bin/bash > #$ -S /bin/bash > #$ -V > #$ -cwd > #$ -N Job1 > #$ -pe orte 200 > #$ -j y > #$ -l h_rt=00:20:00 # Run time (hh:mm:ss) - 10 min > > echo $NSLOTS > /opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 -v -np 4 > -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp hostname > > Here is the output (somewhat truncated) > burl-ct-280r-0 150 =>more Job1.o199 > 200 > [burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199 > [burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE: > /ws/ompi-tools/orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/pe_hostfile > [..snip..] > [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE shows > slots=2 > [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE shows > slots=2 > [..snip..] > burl-ct-280r-1 > burl-ct-280r-1 > burl-ct-280r-0 > burl-ct-280r-0 > burl-ct-280r-0 151 => > > > > On 03/31/09 22:39, PN wrote: > >> Dear Rolf, >> >> Thanks for your reply. >> I've created another PE and changed the submission script, explicitly >> specify the hostname with "--host". >> However the result is the same. >> >> # qconf -sp orte >> pe_name orte >> slots 8 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $fill_up >> control_slaves TRUE >> job_is_first_task FALSE >> urgency_slots min >> accounting_summary TRUE >> >> $ cat hpl-8cpu-test.sge >> #!/bin/bash >> # >> #$ -N HPL_8cpu_GB >> #$ -pe orte 8 >> #$ -cwd >> #$ -j y >> #$ -S /bin/bash >> #$ -V >> # >> cd /home/admin/hpl-2.0 >> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host >> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 >> ./bin/goto-openmpi-gcc/xhpl >> >> >> # pdsh -a ps ax --width=200|grep hpl >> node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v -np 8 >> --host >> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 >> ./bin/goto-openmpi-gcc/xhpl >> node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >> node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl >> >> Any hint to debug this situation? >> >> Also, if I have 2 IB ports in each node, which IB bonding was done, will >> Open MPI automatically benefit from the double bandwidth? >> >> Thanks a lot. >> >> Best Regards, >> PN >> >> 2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com <mailto: >> rolf.vandeva...@sun.com>> >> >> >> On 03/31/09 11:43, PN wrote: >> >> Dear all, >> >> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 >> I have 2 compute nodes for testing, each node has a single quad >> core CPU. >> >> Here is my submission script and PE config: >> $ cat hpl-8cpu.sge >> #!/bin/bash >> # >> #$ -N HPL_8cpu_IB >> #$ -pe mpi-fu 8 >> #$ -cwd >> #$ -j y >> #$ -S /bin/bash >> #$ -V >> # >> cd /home/admin/hpl-2.0 >> # For IB >> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile >> $TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl >> >> I've tested the mpirun command can be run correctly in command >> line. >> >> $ qconf -sp mpi-fu >> pe_name mpi-fu >> slots 8 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile >> stop_proc_args /opt/sge/mpi/stopmpi.sh >> allocation_rule $fill_up >> control_slaves TRUE >> job_is_first_task FALSE >> urgency_slots min >> accounting_summary TRUE >> >> >> I've checked the $TMPDIR/machines after submit, it was correct. >> node0002 >> node0002 >> node0002 >> node0002 >> node0001 >> node0001 >> node0001 >> node0001 >> >> However, I found that if I explicitly specify the "-machinefile >> $TMPDIR/machines", all 8 mpi processes were spawned within a >> single node, i.e. node0002. >> >> However, if I omit "-machinefile $TMPDIR/machines" in the line >> mpirun, i.e. >> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS >> ./bin/goto-openmpi-gcc/xhpl >> >> The mpi processes can start correctly, 4 processes in node0001 >> and 4 processes in node0002. >> >> Is this normal behaviour of Open MPI? >> >> >> I just tried it both ways and I got the same result both times. The >> processes are split between the nodes. Perhaps to be extra sure, >> you can just run hostname? And for what it is worth, as you have >> seen, you do not need to specify a machines file. Open MPI will use >> the ones that were allocated by SGE. You can also change your >> parallel queue to not run any scripts. Like this: >> >> start_proc_args /bin/true >> stop_proc_args /bin/true >> >> >> >> Also, I wondered if I have IB interface, for example, the >> hostname of IB become node0001-clust and node0002-clust, will >> Open MPI automatically use the IB interface? >> >> Yes, it should use the IB interface. >> >> >> How about if I have 2 IB ports in each node, which IB bonding >> was done, will Open MPI automatically benefit from the double >> bandwidth? >> >> Thanks a lot. >> >> Best Regards, >> PN >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- >> ========================= >> rolf.vandeva...@sun.com <mailto:rolf.vandeva...@sun.com> >> 781-442-3043 >> ========================= >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > > ========================= > rolf.vandeva...@sun.com > 781-442-3043 > ========================= > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >