Hi, Am 07.04.2012 um 20:54 schrieb Skip Coombe:
> I follwed your advice by removing the local conf's on both elm and oak, > but saw no difference in my results. > > I believe the clus is in this line from my trial submissions: > > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > [email protected] BIP 0/2/2 0.06 linux-x64 > 35 0.55500 Sleeper skip r 04/07/2012 12:42:02 1 > 36 0.55500 Sleeper skip r 04/07/2012 12:42:02 1 > --------------------------------------------------------------------------------- > [email protected] BIP 0/0/1 -NA- -NA- au > <<<< > > What does "state = au" mean? `man qstat` (section "Full Format (with -f and -F)"): ยท the state of the queue - one of u(nknown) if the corresponding ge_execd(8) cannot be contacted, a(larm), A(larm), C(alendar suspended), s(uspended), S(ubordinate), d(isabled), D(isabled), E(rror) or combinations thereof. > Why is "tot = 2" for elm and "tot = 1" for oak when they are identical hosts If you check the queue configuration, there will be something like: $ qconf -sq all.q ... slots 1,[elm=2] You can change it to read $ qconf -sq all.q ... slots 2 as both machines have two cores if I get you right. > configured the same way? On elm you granted access to another execution daemon, But unless the execution daemon on oak contact the qmaster, he is not aware that he is alive. -- Reuti > Thanks, > > Skip > > On Sat, Apr 7, 2012 at 11:14 AM, Skip Coombe <[email protected]> wrote: > Hi Reuti, > > On Sat, Apr 7, 2012 at 8:27 AM, Reuti <[email protected]> wrote: > Hi, > > Am 07.04.2012 um 03:54 schrieb Skip Coombe: > > > (Sorry for incomplete message) > > > > I set up 2 hosts in one cluster on CentOS 5.4 > > > > Linux version 2.6.18-308.1.1.el5 ([email protected]) (gcc > > version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Wed Mar 7 04:16:51 EST > > 2012 > > Linux elm.tdi.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 > > x86_64 x86_64 x86_64 GNU/Linux > > > > ge2011.11 installed with ge2011.11-x64.tar with mostly default values > > except db=classic (same domain) > > > > (all cmds on elm.tdi.local) > > > > $ qconf -sel > > elm.tdi.local > > oak.tdi.local > > > > but > > > > $ qconf -sconf oak > > configuration oak.tdi.local not defined > > If all the machine have the same OS, you don't need any local configuration > at all. In fact: it often leads to confusion which settings are used finally. > > $ qconf -dconf elm > > The remove the one from eln. Then for both machines the global configuration > is used (qconf -sconf). > I will experiment with this, although I am unclear about the process. The > only configuration I did after > identical installation on elm and oak was to add oak as an execution host on > elm with > qconf -Ae ${oak-execution-host-spec}. > > How did you install SGE on each of them? The share a common directory on both > machines? You started the execd on oak by /etc/init.d/sgeexecd by hand? > Both installations were done identically using scripts install_qmaster > followed by install_execd from > a common pathname (/opt/SGE/ge2011.11) but on separate hosts. > > /opt/SGE/ge2011.11/bin/linux-x64/sge_qmaster > /opt/SGE/ge2011.11/bin/linux-x64/sge_execd > > were both started by the installation scripts and are running on both hosts. > > Skip > > > -- Reuti > > > > I issued "qsub sleeper.sh 300" 6 times and expected to see 2 jobs being > > executed on > > each host, but > > > > $ qstat -f > > queuename qtype resv/used/tot. load_avg arch > > states > > --------------------------------------------------------------------------------- > > [email protected] BIP 0/2/2 0.19 linux-x64 > > 29 0.55500 Sleeper skip r 04/06/2012 15:31:32 1 > > 30 0.55500 Sleeper skip r 04/06/2012 15:31:32 1 > > --------------------------------------------------------------------------------- > > [email protected] BIP 0/0/1 -NA- -NA- > > au > > > > ############################################################################ > > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS > > ############################################################################ > > 31 0.55500 Sleeper skip qw 04/06/2012 15:31:26 1 > > 32 0.55500 Sleeper skip qw 04/06/2012 15:31:27 1 > > 33 0.55500 Sleeper skip qw 04/06/2012 15:31:28 1 > > 34 0.55500 Sleeper skip qw 04/06/2012 15:31:29 1 > > > > > > > > > > [skip@elm jobs]$ qstat -F > > queuename qtype resv/used/tot. load_avg arch > > states > > --------------------------------------------------------------------------------- > > [email protected] BIP 0/2/2 0.19 linux-x64 > > hl:load_avg=0.190000 > > hl:load_short=0.290000 > > hl:load_medium=0.190000 > > hl:load_long=0.150000 > > hl:arch=linux-x64 > > hl:num_proc=2 > > hl:mem_free=2.969G > > hl:swap_free=5.750G > > hl:virtual_free=8.718G > > hl:mem_total=3.796G > > hl:swap_total=5.750G > > hl:virtual_total=9.546G > > hl:mem_used=846.965M > > hl:swap_used=160.000K > > hl:virtual_used=847.121M > > hl:cpu=1.000000 > > hl:m_topology=SCC > > hl:m_topology_inuse=SCC > > hl:m_socket=1 > > hl:m_core=2 > > hl:np_load_avg=0.095000 > > hl:np_load_short=0.145000 > > hl:np_load_medium=0.095000 > > hl:np_load_long=0.075000 > > qf:qname=all.q > > qf:hostname=elm.tdi.local > > qc:slots=0 > > qf:tmpdir=/tmp > > qf:seq_no=0 > > qf:rerun=0.000000 > > qf:calendar=NONE > > qf:s_rt=infinity > > qf:h_rt=infinity > > qf:s_cpu=infinity > > qf:h_cpu=infinity > > qf:s_fsize=infinity > > qf:h_fsize=infinity > > qf:s_data=infinity > > qf:h_data=infinity > > qf:s_stack=infinity > > qf:h_stack=infinity > > qf:s_core=infinity > > qf:h_core=infinity > > qf:s_rss=infinity > > qf:h_rss=infinity > > qf:s_vmem=infinity > > qf:h_vmem=infinity > > qf:min_cpu_interval=00:05:00 > > 29 0.55500 Sleeper skip r 04/06/2012 15:31:32 1 > > 30 0.55500 Sleeper skip r 04/06/2012 15:31:32 1 > > --------------------------------------------------------------------------------- > > [email protected] BIP 0/0/1 -NA- -NA- > > au > > qf:qname=all.q > > qf:hostname=oak.tdi.local > > qc:slots=1 > > qf:tmpdir=/tmp > > qf:seq_no=0 > > qf:rerun=0.000000 > > qf:calendar=NONE > > qf:s_rt=infinity > > qf:h_rt=infinity > > qf:s_cpu=infinity > > qf:h_cpu=infinity > > qf:s_fsize=infinity > > qf:h_fsize=infinity > > qf:s_data=infinity > > qf:h_data=infinity > > qf:s_stack=infinity > > qf:h_stack=infinity > > qf:s_core=infinity > > qf:h_core=infinity > > qf:s_rss=infinity > > qf:h_rss=infinity > > qf:s_vmem=infinity > > qf:h_vmem=infinity > > qf:min_cpu_interval=00:05:00 > > > > ############################################################################ > > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS > > ############################################################################ > > 31 0.55500 Sleeper skip qw 04/06/2012 15:31:26 1 > > 32 0.55500 Sleeper skip qw 04/06/2012 15:31:27 1 > > 33 0.55500 Sleeper skip qw 04/06/2012 15:31:28 1 > > 34 0.55500 Sleeper skip qw 04/06/2012 15:31:29 1 > > > > also qmon cluster conf (on elm) only shows elm, but has both host in > > the execution hosts list and has a host group containing both named > > "@allhosts". > > > > I'm probably overlooking something obvious. Any help will be appreciated. > > > > Skip Coombe > > [email protected] > > > > > > > > -- > > Skip Coombe > > [email protected] > > 919.442.VLSI > > > > > > > > > > > -- > Skip Coombe > [email protected] > 919.442.VLSI > > > > > > > -- > Skip Coombe > [email protected] > 919.442.VLSI > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
